0% found this document useful (0 votes)
262 views

Statistical Method Book For Lectures

This document provides a summary of key concepts in advanced level statistics. It covers topics such as representing and summarizing data through histograms, frequency polygons, means, and standard deviations. It also discusses regression, correlation, probability, and the normal distribution. The document is intended as a concise course to teach statistical techniques and analysis.

Uploaded by

Baiye Randolf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
262 views

Statistical Method Book For Lectures

This document provides a summary of key concepts in advanced level statistics. It covers topics such as representing and summarizing data through histograms, frequency polygons, means, and standard deviations. It also discusses regression, correlation, probability, and the normal distribution. The document is intended as a concise course to teach statistical techniques and analysis.

Uploaded by

Baiye Randolf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 348

-- ---------~

A CONCISE COURSE IN
ADVANCED LEVEL
STATISTICS

J CRAWSHAW BSc

J CHAMBERS MA

!
!
I

I
.J
Text© J Crawshaw and J Chambers 1984, 1990, 1994, 2001
Original illustrations© Nelson Thornes Ltd 1994, 2001 Contents
Text © !CT Statistics Supplement, Douglas Butler, 2001

The rights of J Crawshaw and J Chambers to be identified as authors of this work


have been asserted by them in accordance with the Copyright, Design and Patents
Vll
Act 1988. Preface
All rights reserved. No part of this publication may be reproduced or transmitted in
any form or by any means, electronic or mechanical including photocopy, 1 Representation and snmmary of data 1
recording, or any information storage and retrieval system, without permission in
writing from the publisher or under licence from the Copyright Licensing Agency 1
Discrete data
Limited of 90 Tottenham Court Road, London W1T 4LP.
Continuous data 2
Stem and leaf diagrams (stemplots) 4
Any person who commits any unauthorised act in relation to this publication may
Ways of grouping data 9
be Iaible to criminal prosecution and civil claims for damages.
11
Histograms
First published in 1984 by: Frequency polygons 17
Stanley Thomes (Publishers) Ltd
Frequency curves 19
Circular diagrams or pie charts 24
Second Edition 1990
28
Third Edition 1994 The mean
Fourth Edition 2001 Variability of data 38
The standard deviation, s, and the variance 5 2 38
Reprinted in 2002 by: 47
Nelson Thomes Ltd Combining sets of data '
Delta Place Scaling sets of data 51
27 Bath Road Using a method of coding to find the mean and standard deviation 56
CHELTENHAM
Cumulative frequency 58
GL53 7TH 65
United Kingdom Cumulative percentage frequency diagrams
Median, quartiles and percentiles 68
o4 o5 1 10 9 8 7 6 5 4
Skewness
84
A catalogue record of this book is available from the British Library The normal distribution 89
Box and whisker diagrams (box plots) 92
ISBN 0 7487 5475-X Summary
102
Page make-up by Mathematical Composition Setters Ltd

2 Regression and correlation 118


Printed and bound by Graficas Estella
Scatter diagrams 118
Regression function 119
Linear correlation and regression lines 119
The product-moment correlation coefficient, r 139
Spearman's coefficient of rank correlation r 146
' ' 154
Summary

3 Probability 168

Experimental probability 169


Probability when outcomes are equally likely 171
SubjeCtiVe probabilities 171
Probability notation and probability laws 171
Illustra:t~g two or more events using Venn diagrams 175
Probability rule for combined events 175
Exclusive (or mutually exclusive) events 179
~--~~~~------------------------------------

7 The normal distribution 360


180
Exhaustive events 182 361
Conditional probability Finding probabilities
185 The standard normal variable, Z 361
Independent events 193 362
Probability trees Using standard normal tables
197 Using standard normal tables for any normal variable, X 368
Bayes' Theorem 204 371
Using the standard normal tables in reverse to find z when <!>(z) is known
Some useful methods 206 374
Using tbe tables in reverse for any normal variable, X
Arrangements 214 378
Permutations of r objects from n objects Value of I' or a or both
214 The normal approximation to the binomial distribution 382
Combinations of r objects from n objects 221 Continuity corrections 383
Summary
Deciding when to use a normal approximation and when to use
a Poisson approximation for a binomial distribution 387
233 390
4 Probability distributions I - discrete variables The normal approximation to the Poisson distribution
233 Summary 392
Probability distributions 237
Expectation of X, E(X) 245
Expectation of any function of X, E(g(X)) 8 Linear combinations of normal variables 403
248
Variance, Var(X) or V(X) 253 The sum of independent normal variables 403
The Cumulative distribution function, F(x) 256 407
The difference of independent normal variables
Two independent random variables 258 Multiples of independent normal variables 410
Distribution of X 1 + X 2 + ··· + Xn 259 Summary 414
Comparing the distributions of X 1 + X 2 and 2X 262
Summary
9 Sampling and estimation 421
270 421
5 Special discrete probability distributions Sampling
270 Surveys 422
The uniform distribution 271 Sampling methods 424
The geometric distribution 275 Simulating random samples from given distributions 431
Expectation and variance of the geometric distribution Sample statistics 436
278
The binomial distribution 286 The distribution of the sample mean 436
Expectation and variance of the binomial distribution 441
291 Central limit theorem
The Poisson distribution 299 The distribution of the sample proportion, p 444
Using the Poisson distribution as an approximation to the binomial distribution 447
301 Unbiased estimates of population parameters
The sum of independent Poisson variables Point estimates 447
304
Summary Interval estimates 449
The !-distribution 462
314 Confidence intervals for the population proportion, p 469
6 Probability distributions II- continuous variables 472
Summary
314
Continuous random variables 314
Probability density function (p.d.f.) 320 483
10 Hypothesis tests: discrete distributions
Expectation of X, E(X) 324
Expectation of any function of X Hypothesis test for a binomial proportion, p (small sample size) 483
327
Variance of X, Var(X) Procedure for carrying out a hypothesis test 486
329
The mode One-tailed and two-tailed tests 489
334
Cumulative distribution function F(x) Summary of stages of a hypothesis (significance) test 492
341
Obtaining the p.d.f., f(x), from the cumulative distribution function Type I and Type II errors 493
345
The continous uniform (or rectangular) distribution Significance test for a Poisson mean A 496
347
Expectation and variance of the uniform distribution Summary of stages of a significance test 500
348
The cumulative distribution function, F(x), for a uniform distribution Summary of Type 1 and Type II errors 501
351
Summary
vi CONTE.rrrs

507
11 Hypothesis testing (z-tests and t-tests)
507
Hypothesis testing 511 Preface
One-tailed and two-tailed tests
512
Critical z-values 513
Summary of critical values and rejection criteria
513
Stages in the hypothesis test . Introduction
514
Hypothesis test 1: testing ft (the mean of a populatton)
520 This fully revised and updated edition of A Concise Course in Advanced Level Statistics is a
Type I and Type II errors . . . . 528
Hypothesis test 2: testing a bmomial proportton p when n iS large comprehensive text for use primarily by students and teachers of Advanced Level
Hypothesis test 3: testing flt- flz, the difference between means of two Mathematics, both at AS and A2 level. It also provides a useful support for those studying
534 statistics as part of science, social science and humanities courses.
normal populations
547
Summary
Features
560
12 The z 2 significance test Points of theory are explained concisely and illustrated clearly by worked examples, many
560 taken from Advanced Level papers.
The x2 significance test Carefully graded exercises help you to consolidate ideas and gain experience in applying
563
Performing a x2 goodness-of-fit test . . theory to different situations.
2
Summary of the procedure for performrng ax goodness-of-fit test 566
567 Frequent hints pinpoint common misunderstandings and reinforce ideas.
Test 1- goodness-of-fit test for a umform distnbution . Key concepts and formulae are highlighted in colour to increase clarity. Frequent
Test 2- goodness-of-fit test for a distrib~tlO~ m. a g~ven rat10
568
571 summaries provide a quick reference.
Test 3- goodness-of-fit test for a binomial distnbutton Extensive miscellaneous exercises and end-of-chapter tests provide practice in tackling
Test 4- goodness-of-fit test for a Poisson dtstnbutton
573
576 examination questions, providing essential examination preparation.
Test 5- goodness-of-fit test for a normal dtstnbutton . Answers to all exercises are provided.
Summary of the number of degrees of freedom for a goodness-of-fit test 579
582 An ICT supplement explores the use of ICT in the study of statistics.
The x' significance test for independence
590
Summary Specifications

600 The text covers the main theory required in the specifications of all the examination boards
13 Significance tests for correlation coefficients for the statistics sections of AS and A2 Mathematics.
600
Significance tests for correlation coefficients ..
Test for the product-moment correlation coeffiCient, r
600 Examination Questions
Spearman's coefficient of rank correlation, rs
605
608 We are grateful to the following Awarding Bodies for permission to reproduce questions from
Summary their past examinations:

617 Assessment and Qualifications Alliance (AQA), including Northern Examinations and
ICT statistics supplement Assessment Board (NEAB/JMB) and Associated Examining Board (AEB)
The Edexcel Foundation including University of London Examinations and Assessment
645
Appendix Councils (L)
645 Mathematics in Education and Industry (MEl)
Cumulative binomial probabilities
648 Oxford, Cambridge and RSA (OCR) including University of Cambridge Local
Cumulative Poisson probabilities
649 Examinations Syndicate (C), Oxford & Cambridge Schools Examination Board (0 & C)
The standard normal distribution function
649 and Oxford Delegacy of Local Examinations (0)
Critical values for the normal distribution
650 Welsh Joint Education Committee (WJEC)
Critical values for the t-distribution
651 All answers and worked solutions provided for examination questions are the responsibility of
Critical values for the x2 distribution
652 the authors.
Critical values for correlation coefficients
653
Random numbers We hope that you will enjoy using this text and that it will enhance your understanding of
statistics and give you confidence to succeed.
655
Answers J Crawshaw & J Chambers
2001
Representation and summary of data

In this chapter you will learn about


discrete and continuous data

stem and leaf diagrams (stem plots}

histograms, frequency polygons and the shape of a distribution

pie charts

means and weighted means

standard deviation and variance

cumulative frequency

medians, quartiles and inter-percentile ranges

skewness, including Pearson's coefficient and quartile coefficient


the shape of the normal distribution
box-and-whisker diagrams (boxplots} and outliers

DISCRETE DATA
In a survey of lm quadrats in a field the number of snails in each of 30 quadrats was recorded
as follows:

1124023 1 4 2 3 5 2 2 3 2
231232 0 1 1 2 0 3 2 3 3
This is an example of discrete raw data.
Discrete data can take only exact values, for example
the number of cars passing a checkpoint in 30 minutes,
the shoe sizes of children in a class,
the number of tomatoes on each plant in a greenhouse.
The data are known as raw because they have not been ordered in any way.
Frequency distribution for discrete data For example, the measurement 144 ern (given to the nearest em) could have arisen from any
value in the interval143.5 em<; h < 144.5 em.
To illustrate the data more concisely, count the number of times each value occurs and Other examples of continuous data are
summarise these in a table, known as a frequency distribution.
the speed of a vehicle as it passes a checkpoint,
Number of snails 0 1 2 3 4 5 the mass of a cooking apple,
the ti1ne taken by a volunteer to perform a task.
Frequency 3 5 11 8 2 1 Total30

The frequency distribution can be represented diagrammatically by a vertical line graph or a


bar chart. The height of the line or bar represents the frequency.
Frequency distribution for continuous data
Vertical line graph to show Bar chart to show
To form a frequency distribution of the heights of the 20 children, group the information into
number of snails number of snails
classes or intervals. Here are three different ways of writing the same set of intervals.

12 12
i 10
"~ 10
- Height (em) Height (em) Height (to the
nearest em)
~
~
8
! 8 - 119.5 <; h < 124.5 119.5-124.5
124.5 <; h < 129.5 124.5-129.5 120-124
6 6
c- 129.5 <; h < 134.5 129.5-134.5 125-129
4 4
I-- 134.5 <; h < 139.5 134.5-139.5 130-134
2
~.
2 135-139
139.5 <; h < 144.5 139.5-144.5
0 0
0 2 3 4 5 140-144
0 2 3 4 5
Number of snails
Number of snails

The values 119.5, 124.5, 129.5, ... are called the class boundaries or the interval boundaries.
Notice that
The upper class boundary (u.c.b.) of one interval is the lower class boundary (I.e. b.) of the
0 in the vertical line graph the distinct lines reinforce the discrete nature of the variable, next interval.
., in the bar chart the bars are all the same width and they are labelled in the middle of the
bar on the horizontal axis.
Width of an interval
The mode The width of an interval is the difference between the boundaries.

'I'he mode is the value that occurs most often. \'!./idth of <1ll interval upper class boundary - lovvcr class boundary

The mode is the most popular value, deriving from the French 'a Ia mode' meaning Often intervals with equal widths are chosen, as in the above illustrations in which each width
fashionable. It is easy to see from the diagrams above that the mode is 2 snails per quadrat. is 5 em.

To group the heights it helps to use a tally column, entering the numbers in the first row
133, 136, 120, ... etc. and then the second row. It is a good idea to cross off each number in
CONTINUOUS DATA the list as you enter it. The frequency distribution for the above data should read:

The following data were obtained in a survey of the heights of 20 children in a sports club.
Height (em) Tally Frequency
Each height was measured to the nearest centimetre.
119.5 <; h < 124.5 I 1 i

133 136 120 138 133 131 127 141 127 143 I
124.5 <; h < 129.5 mt 5 I
130 131 125 144 128 134 135 137 133 129 I
129.5 <; h < 134.5 Mil 7
134.5 <; h < 139.5 II II 4
This is an example of continuous raw data.
139.5 <; h < 144.5
I
Ill 3
Continuous data cannot talze c2<act values but can_ be Vv'ith.in
1neasurcd to a of accuracy. Total20 II
r--·- ---~--------·---~-~--·-- -------------·-----~------·-------------·---- -~---------·-·-----~- ---

The stemplot gives a good idea at a glance of the shape of the distribution. It is easy to pick
It is important to note that when the data are presented only in the form of a grouped
out the smallest and largest values and to see that the mode is 54. It is also obvious that the
frequency distribution, the original information has been lost. For example you would know
modal class is 50-59.
that there was one item in the first interval, but you would not know what it was. You would
know only that it was between 119.5 em and 124.5 em.
Example 1.1
The. maximu.m temperature in oc, measured to the nearest degree, was recorded each day
STEM AND LEAF DIAGRAMS (STEMPLOTS) durmg June m Sutton with the following results:

A very useful way of grouping data into classes while still retaining the original data is to
19 23 19 19 20 12 19 22 22 16 18 16 19 20 17
draw a stem and leaf diagram, also known as a stemplot. 13 14 12 15 17 16 17 19 22 22 20 19 19 20 20
These are the marks of 20 students in an assignment:
Draw a stem and leaf diagram to illustrate the temperatures and write down the modal
84 17 38 45 47 53 76 54 75 22 temperature.
66 65 55 54 51 44 39 19 54 72
Solution 1.1
Notice that the lowest mark is 17 and the highest mark is 84.
In stem and leaf diagrams, all the intervals must be of equal width, so it seems sensible to The smallest value is 12 and the highest value is 23. Grouping the data into intervals
choose intervals 10-19,20-29, 30-39, ... , 80-89 for this data. 10-19,20-29, ... would give you very little information.

Take the stem to represent the tens and the leaf to represent the units. C~oose a sen~ible number of intervals; usually between 5 and 10. Since you must use intervals
w1th equal w1dth, you could use intervals of 2 "C and consider 12-13 14-15 16-17 18-19
The first five entries When all the numbers 20-21, 22-23. ' ' . ' '
84, 17, 38,45 and 47 have been entered the
are represented like this: diagram looks like this: First do a preliminary plot and then arrange the entries in each leaf in order.

Preliminary plot: Final diagram:


Stem Leaf
(tens) (units) Stem Leaf
1 7 9 Stem Leaf Stem Leaf
1 7 1 2 3 2 1 2 2 3 Key 1[2 means 12 "C
2 2 2
3 8 3 9 1 4 5 1 4 5
3 8 1 667767 1 666777
4 5 7 4 5 7
5 345414 1 999989999 1 899999999
5 2 00000 2 00000
6 6 6 5
7 6 5 2 2 3 2 2 2 2 2 22223
7
8 4 8 4
The modal temperature is 19 "C. Stemplot to show maximum temperatures
The entries in each leaf are now arranged in numerical order and a key is given to explain the
stem and leaf. The final diagram looks like this:

Stem and leaf diagram to show assignment marks NOTE: The stem does not necessarily represent the tens digit. For example, suppose you
:a~tsto use mtervals 12-14, 15-:17, 18-20,21-23. The interval18-20 cannot be represented
Stem
1
Leaf
7 9
I Key 1[7 means 17 marks Y . tern of 1, smce the tens d1g1t changes during the interval. For the stem you can use 12
15, 18 and 21. The leaf is then given as the number that is added to the stem. '
2 2
3 3 8 9 Stem Leaf Key 15[2 means 17 "C
4 5 7 12 0 0 1 2 18[ 0 means 18 "C
5 134445 15 0 1 1 1 2 2 2
6 5 6 18 0 1 1 1 1 1 1 1 1 2 2 2 2 2
7 2 5 6 21 1 1 1 1 2
8 4
6 ,t. CO>JCiSE: COUHSE ii\ i\--L _\/[L SLLJiST!CS

Example 1.3
NOTE: The key is essential in explaining how the stemplot has been formed.
Look at this stem and leaf diagram and for each of the three keys provided, give
In a stem and leaf diagram, or :.ternplot
(a) the value ringed,
intervals must be chosen 1 (b) the width of the interval containing the ringed value.
a Js
Stem Leaf (i) the widths of 30 metal components
0 7
Example 1.2 I Key 112 means 1.2 em
0 9
The table gives the number of days on which rain fell in 36 consecutive intervals of 30 days. 1 0 1
1 2 2 (ii) the reaction times of 30 volunteers
21 19 6 12 8 18 9 8 11 17 15 13 1 4 4 4 5 5
16 9 17 18 9 24 17 7 8 17 17 8 1 6 6 7 7 7 Key 112 means 12 hundredths of a second
7 11 16 17 8 5 13 22 20 16 20 13 1 8888990!!
2 0 0 1 1 (iii) the attendance at 30 matches
Draw stem and leaf diagrams with the following class intervals: 2 2 3
2 4 Key 1 12 means 1200 people
(a) 5-9, 10-14, 15-19,20-24
(b) 4-6,7-9, 10-12, 13-15, 16-18, 19-21,22-24.
Solution 1.3
Solution 1.2
(i) (a) 119 means 1.9 em.
(a) Using intervals 5-9, 10-14, 15-19,20-24 the completed stem and leaf diagram is: (b) The interval is 1.8 cm-1.9 em. Since width is a continuous variable, and assuming
Leaf that widths have been measured to the nearest tenth of a centimetre, then
Stem
0 5 6 7 7 8 8 8 8 8 9 9 9 Key 116 means 16 1.75 em<; width< 1.95 em and the class width is 2 mm.
1 112333 (ii) (a) 119 means 19 hundredths of a second, i.e. 0.19 seconds.
1 5 6 6 6 7 7 7 7 7 7 8 8 9 (b) The interval is 0.18 sec-0.19 sec, i.e. 0.175 <;time< 0.195, so the class width is 0.02
2 0 0 12 4 seconds.
(iii) (a) 119 means 1900 people.
NOTE: The stem and leaf diagram could have been written differently, as follows: (b) The interval is 1800 people-1900 people. Assuming that the number has been given
to the nearest hundred, then 1750 <;number< 1950, so the class width is 200 people.
Stem Leaf
5 0 1 2 2 3 3 3 3 3 4 4 4 Key 1511 means 16
10 112333 513 means 8
15 0 1 1 1 2 2 2 2 2 2 3 3 4
20 0 0 12 4 Back-to-back: stemplots

(b) Using intervals 4-6,7-9,10-12, ... the completed diagram, arranged in order is: Stem and leaf diagrams can be used to compare two samples by showing the results together
on a back-to-bad< stemplot.
Stem Leaf
4 1 2 Key 13 12 means 15
7 0011111222 Example 1.4
10 1 l 2
13 0002 Use a stem and leaf diagram to compare the examination marks in French and English for a
16 0 0 0 1 1 1 1 1 1 2 2 class of 20 pupils.
19 0 1 1 2
22 0 2 French 75 69 58 58 46 44 32 50 53 78
81 61 61 45 31 44 53 66 47 57
Both diagrams show that the mode is 17 rainy days, but the seven intervals used in (b)
show more clearly the two peaks, illustrating that the d1stnbutwn IS approximately English 52 58 68 77 38 85 43 44 56 65
bi-modal, with modal classes 7-9 and 16-18. 65 79 44 71 84 72 63 69 72 79
:; ~; ' ' ) \_; ·J ~-(
''! ' 9
''
i'-'
' ! ' ,'\ !' '" ' '
1'·./i ) ') ,il

9.2 7.3 7.0 6.5 5.4 5.3 10.1 8.4 7. Draw back-to-hack stemplots for the following
Solution 1.4 8.8 7.1 7.6 7.9 6.7 9.6 5.5 7.4 data. What conclusions can you draw?
7.0 8.2 5.5 7.8 8.2 7.5 6.1 6.1
The first four entries for French (75, 69, 58, 58) and for English (52, 58, 68, 77) are entered (a) The pulse rates of 30 company directors
3.9 6.8 7.6 8.1 8.0 10.0 were measured before and after taking
into a back-to-bad< stemplot as follows:
5. The daily hours of sunshine in London during exercise.
French English Key (English) August were Before: 110,93, 81, 75, 73, 73, 48, 53, 69,
Key (French) 69, 66, 111, 105,93, 90, 50, 57, 64, 90,
9[6 means 69 3 5[2 means 52 7.0 7.6 12.5 12.9 8.3 9.7 8.4 11.1 111,91, 70, 70,51, 79,93, 105,51,66,93.
4 7.5 7.5 9.8 10.4 11.6 11.3 7.3 7.8 Aftey, 117, 81, 77, 108, 130, 69, 77, 84, 84,
8 8 5 2 8 6.8 6.2 6.1 5.6 5.6 5.8 4.8 4.3 86,95, 125,96,104,104,137,143, 70,80,
6 8 0.0 0.6 0.8 1.6 0.2 2.4 2.6 131, 145, 106, 130, 109, 137,75, 104, 75,
9
97, 80.
5 7 7 illustrate these data on a stem and leaf diagram (Use class intervals 40-49,50-59,60-69, ... )
8 and comment. (b) The ages of teachers in two schools:
The completed diagram, before rearranging, is: School Ac 51, 45, 33, 37, 37, 27, 28, 54, 54,
6. A stem plot is given below but it does not have a 61, 34, 31, 39, 23, 53, 59, 40, 46, 48, 48,
English key. 39, 33, 25, 31, 48, 40, 53, 51, 46, 45, 45,
French
8 Stem Leaf 48, 39, 29, 23, 37.
1 2 3
5 9 School Be 59, 56, 40, 43, 46, 38, 29, 52, 54,
7 4 5 4 6 4 3 4 4 34,23,41,42,52,50,58,60,45,45,56,
6 1 4
733088 5 2 8 6 6 7 8 9 59,49,44,36,38,25,56,36,42,47,50,
6 1 1 9 6 8 5 5 3 9 7 2 3 3('!) 54, 59, 47, 58, 57.
7 9 1 2 2 9 7 5 6 6 6 7 8 (Use class intervals 20-29,30-39,40-49, ... )
8 5 7
8 0 3 4 (c) 20 boys and 20 girls took part in a reaction-
1 8 5 4
8 5 timing experiment. Their results were
The final diagram, arranged in order: measured to the nearest hundredth of a
State the value ringed and the width of the
interval that it is in when the diagram illustrates second.
French English Key (English) (a) the times taken for a journey, where 618
Girl" 0.22, 0.21, 0.18, 0.18, 0.16, 0.19,
Key (French) 8 0.25, 0.22, 0.17, 0.19, 0.16, 0.21, 0.24,
2 1 3 6[3 means 63 represents 6.8 hours,
8[5 means 58 (b) the masses, in g to three decimal places, of 0.22, 0.19, 0.22, 0.25, 0.22, 0.17, 0.22.
7 6 5 4 4 4 3 4 4 Boy" 0.14, 0.20, 0.22, 0.16, 0.19, 0.16,
components, where 61 8 represents 0.068 g.
887330 5 2 6 8 0.15, 0.23, 0.23, 0.19, 0.16, 0.15, 0.09,
9 6 1 1 6 3 5 5 8 9 0.23, 0.11, 0.21, 0.22, 0.18, 0.18, 0.16.
122799 (Usc class intervals 0.08-0.09, 0.10-0.11,
8 5 7
0.12-0.13, ... )
1 8 4 5
From the diagram it is clear that the class had higher marks in English than in French and it
appears that they performed better in English. This would, however, depend on the standards
Of marking used in the two examinations.
WAYS OF GROUPING DATA

la The following frequency distributions show some of the ways that data can be grouped. The
3. A group of adults took part in an experiment information is more concise than the raw data, but the disadvantage is that the original
1. (a) Draw a stemplot to show the masses, correct
to the nearest kilogram, of 30 men. which measured their reaction times. The results information has been lost.
Use intervals 50-54, 55-59, 60-64, ... were given to the nearest hundredth of a second.
0.14 0.17 0.21 0.20 0.20 0.22 (i) Frequency distribution to show the lengths, to the nearest millimetre, of 30 rods
(b) Write down the modal mass.
74 52 67 68 71 76 86 81 73 0.14 0.24 0.26 0.17 0.14 0.17
68 64 75 71 61 63 57 67 57 0.21 0.20 0.22 0.14 0.24 0.26
59 72 79 64 70 74 77 79 65 0.17 0.18 0.17 0.21 0.20 0.23 Length (mm) 27-31 32-36 37-46 47-51
68 76 83 0.17 0.23 0.21 0.23 0.24 0.23
Use intervals 0.14-0.15, 0.16-0.17, Frequency 4 11 12 3
2. A teacher recorded the times taken by 20 boys to 0.18-0.19, ... to draw a stemplot to illustrate the
swim one length of the pool. results. Comment on your diagram.
The interval27-31 means 26.5 mm <;length< 31.5 mm.
The times are given to the nearest second. 4. In a lesson on measurement, 30 pupils estimated
Using intervals 24-25,26-27, ... ,draw a stem the length of a line in centimetres and wrote The class boundaries are 26.5, 31.5, 36.5, 46.5, 51.5
and leaf diagram to illustrate the results. down their value correct to the nearest mm.
32 31 26 27 27 32 29 26 25 25 Using intervals 3.0-3.9, 4.0-4.9, ... ,draw a The class widths are 5, 5, 10, 5
29 31 32 26 30 24 32 27 26 31 stemplot.
(ii) Frequency distribution to show the marks in a test of 100 students (vi) Frequency distribution to show ages (in completed years) of applicants for a teaching post

30-39 40-49 50-59 60-69 70-79 80-99 Age (years) 21-24 25-28 29-32 33-40 41-52
Mark
14 26 20 18 12 Frequency 4 2 2 1 1
Frequency 10

This distribution can be interpreted in two ways: Since the ages are given in completed years (not to the nearest year) then '21-24' means
21 <:age< 25. Someone who is 24 years and 11 months would come into this category.
(a) As discrete data, the interval 30-39 represents 30.;; mark< 40. Sometimes this interval is written '21-' and the next is '25-', etc.
The class boundaries are 30, 40, 50, 60, 70, 80, 100
The class widths are 10, 10, 10, 10, 10, 20 The class boundaries are 21, 25, 29, 33, 41, 53
(b) As continuous data, assuming marks are to the nearest integer, 30-39 would
The class widths are 4, 4, 4, 8, 12
represent 29.5 <:mark< 39.5.
The class boundaries are 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 99.5
The class widths are 10, 10, 10, 10, 10, 20
HISTOGRAMS
(iii) Frequency distribution to show the lengths of 50 telephone calls
Grouped data can be displayed in a histogram as in the following diagram.
Length of call (min) 0- 3- 6- 9- 12- 18-

Frequency 9 12 15 10 4 0

The interval '3-' means 3 minutes.;;;.; time< 6 minutes, so any time including 3 n1inutes
and up to (but not including) 6 minutes comes into this interval.

The class boundaries are 0, 3, 6, 9, 12, 18


The class widths are 3, 3, 3, 3, 6
(iv) Frequency distribution to show the masses of 40 packages brought to a particular counter
at a post office

Mass (g) -100 -250 -500 -800

Frequency 8 10 16 6

The interval '-250' means 100 g <mass<: 250 g, so any mass over 100 grams up to and
including 250 grams comes into this interval.
The class boundaries are 0, 100, 250, 500, 800
The class widths are 100, 150, 250, 300
.

(v) Frequency distribution to show the speeds of 50 cars passing a checkpoint


.

Speed (km/h) 20-30 30-40 40-60 60-80 80-100

Frequency 2 7 20 16 5
This histogram represents the following table for the distribution of ages of passengers on a
shuttle flrght from Denver, Colorado to Salt Lake City, Utah.
The interval 30-40 means 30 km/h <:speed< 40 km/h.
The class boundaries are 20, 30, 40, 60, 80, 100 Age, x years 0<;x<20 20<;x<40 40 <;x <50 50<;x<70 70 <;X< 100
Frequency 4 44 36 28 6
The class widths are 10, 10, 20, 20, 20

i!
i !

Histograms resemble bar charts, but there are two important differences. Solution 1.5
The data are continuous.
The class boundaries are 0.5, 20.5, 40.5, 60.5, 80.5, 100.5
there arc nu gaps bct\veu; the The interval widths are 20, 20, 20, 20, 20
t"bc :uc:_~ of each har is nnmrlrtiJm to the ll<..:m.:y that it rcprcsc_nrs. This means that
In this example all the intervals are of equal width and you could use the frequency for the
height of the bar. It is, however, a good idea to use the frequency density for the height of the
bar. The resulting histogram will then have a total area which represents the total frequency.
Histograms often have bars of varying widtbs, so the height of the bar must be adjusted in
accordance with the width of the bar.
Mass (g) Interval width Frequency Frequency density
The vertical axis is not labelled frequency but frequency density where
0.5 <;;x <20.5 20 10 0.5
20.5 <;X< 40.5 20 18 0.9
- -Interval \-Vidth 40.5 <;;X< 60.5 20 24 1.2
60.5 <;X< 80.5 20 14 0.7
Consider the interval 20 .;; x < 40 in the frequency table above. 80.5 <;X< 100.5 20 18 0.9
Frequency~ 44, interval width~ 20, so frequency density~ l_i\ ~ 2.2
Histogram to show the masses of letters
The complete table looks like this: ,.
-~
1.2
c represents
w
TI I I l letter
Ages Interval width Frequency Frequency density (;'
c
,
w
~
1.0
0 <;X< 20 20 4 0.2 ~
~

20<;x<40 20 44 2.2
0.8
40 <;X< 50 10 36 3.6
50<;x<70 20 28 1.4 .

0.6 :..:4
70 <;X< 100 30 6 0.2
18 l8
0.4
14
10
Modal class 0.2

The highest bar in the histogram represents the interval40 <; x <50. This is the modal class. .

Notice that in the table this interval does not have the greatest frequency, but 1t does have the 0.5 20.5 40.5 60.5 80.5 100.5
greatest frequency density. Mass of letter (g)

the mod~1l cbs:; is rhc intu -"a_l vvirh the """""" htqucnrey
1

the 1-:ur in tht' mrror.mrHt The main purpose of histograms is to illustrate grouped continuous data, but they can also be
used to illustrate grouped discrete data.
Example 1.5
Example 1.6
The grouped frequency distribution records the masses, to the nearest gram, of 84letters
delivered by the postman. These are the examination marks for a group of 120 first year statistics students.

Mass (g) 1-20 21-40 41-60 61-80 81-100 Mark 0-9 10-19 20-29 30-49 50-79

Number of letters 10 18 24 14 18 Frequency 8 21 53 28 10

Represent the data in a histogram and comment on the shape of the distribution.
Draw a histogram to illustrate tbese data.
Solution 1.6
Finding the frequencies from a histogram

The data are discrete, so, to avoid gaps in the histogram, use class boundaries 9.5, 1.9.5, 29.5, To find the frequency in each interval, use
49.5. This leads to -0.5 and 79.5 as the remaining two boundaries, even though these marks
are outside the range of the discrete data.

The class boundaries are -0.5, 9.5, 19.5, 29.5, 49.5, 79.5
Example 1.7
The interval widths are 10, 10, 10, 20, 30
A Passengers' Association conducted a survey on the punctuality of trains using a particular
Class Frequency station. The histogram illustrates the results.
Mark Width Frequency Density
(a) Construct the frequency distribution.
0-9 10 8 0.8 (b) How many trains were there in the survey?
10-19 10 21 2.1
20-29 10 53 5.3 Histogram to show lateness of trains
30-49 20 23 1.4
50-79 30 10 0.3 £ 9 ~~
I! • I '
'I ;;
'
'Ill II !!• I!
I'
II
,," h
LT
u. \1
-
~

i 'l I i H
,,
c

Histogram to show examination marks


ID
u
~
u
c
m
~
8

7
li 11
1

i i i
;c
I
i
i:
rr
8 n ,- _ I " II
~
~
6
u i r:I' I

p
·H~-
5 i' I I• H
I U! ;. !· 1:
4
t
3 r: n E I i• b [1 ir uu
ln :(i ),I·' ji tD
3
53
2 rt! 1 J· I
1-
2
r, r': ~ n n ~~H i i1'i h:
H!
21
28 0 10 20 30 40 50 60 70 80
I 8
59.5
10
69.5 79.5
Number of minutes late (t)
-0.5 9.5 19.5 29.5 39.5 49.5
Marks
Solution 1.7
The distribution has a long tail of values to the right. It is said to be positively skewed.
(a) To find the frequency in each interval, use frequency= interval width x frequency density
HINT: when drawing the histogram you will find it easier to mark out the horizontal axis
Number of (t)
-0.5, 9.5, 19.5, ... using the lines of your squared paper. Then draw in the vertical frequency
minutes late O~t<S 5 o;;;;t< 1.0 10" t < 20 20<;t<30 30<;t<50 50" t < 80
density axis in a suitable position. Anywhere will do for this; it does not have to go through
(0, 0), but could be to the left of -0.5, for example Frequency 5x6.4 5 X 8.8 10 X 2.8 10 X 1.2 20 X 0.6 30 X 0.2
=- 32 =44 =28 =12 =12 =6

'] -0.5 9.5 19.5


(b) Number of trains= 32 + 44 + 28 + 12 + 12 + 6 = 134
16 ! ·' 17

Example 1.8 Solution 1.9


The number of letters delivered to the houses in Distribution Street is illustrated in the Rectangle representing 10-19 interval:
histogram. Given that 13 houses received three or four letters, how many houses are there in
Area of rectangle ~ 2 x 2.4
the street? Explain the scale on the vertical axis. 8 2.4 em ~ 4.8 cm 2
children
~
!
l i! !
I

u•
i'i'
c
!
i 2cm
Area oc frequency
~
l !···· Area ~k x frequency
! I I 4.8 ~k X 8

I
..
k ~0.6
Total area ~
k x total frequency
I
' I .....
53.4 ~
0.6 x total frequency
53.4
. Total frequency ~ - - ~ 89
! ..
0.6
I
I
...

i 1 There were 89 children in the group.


'
.

2 3 4 5 6 7 8 9 10
Number of letters delivered

FREQUENCY POLYGONS
Solution 1.8
The scale on the frequency density axis has not been marked but since you are given that there A grouped frequency distribution can be displayed as a frequency polygon.
are 13 houses in the interval 3-4 it is easy to see the area of four small squares represents one
To construct a frequency polygon, for each interval plot frequency density against the
house. mid-interval value, where
represents 1 house

The frequencies can be deduced directly from this, for example, the interval 7-10 contains
two houses. Then join the points with straight lines.
Total frequency~ 5 + 13 + 10 + 2 ~ 30
There are 30 houses in the street. Example 1.10
To work out the scale on the frequency density axis, note that the interval 3-4 has frequency
Draw a frequency polygon to illustrate this frequency distribution which gives the times taken
13 and is of width 2, therefore frequency density~ 13 + 2 ~ 6.5.
by 31 competitors to complete a cross-country run.
Since the bar is 13 squares high, each square on the vertical axis represents a frequency
density of 0.5. Timet(min) 25<;t<30 30 <;t< 35 35<;t<40 40 <;t< 50 50 <;t< 65
Frequency 4 12 8 4 3
Although it is easier to use frequency density for the vertical scale in the histogram, other
scales can be used, provided that area is proportional to frequency. This is illustrated in the
Solution 1.10
following example.
Mid-interval Frequency
Example 1.9 Time value Interval width Frequency density
A teacher recorded the time, to the nearest minute, spent reading during a particular day by
each child in a group. The times were summarised in a grouped frequency distribution and 25 «<30 27.5 5 4 i~ 0.8
represented by a histogram. The first class in the grouped frequency distribution was 10-19 30 <; t< 35 32.5 5 12 ¥=2.4
and its associated frequency was eight children. On the histogram the height of the rectangle 35<;t<40 37.5 5 8 ~ ~ 1.6
representing the class was 2.4 em and the width was 2 em. The total area under the histogram 40<;t<50 45 10 4 1o = 0.4
was 53.4 cm 2 • so«< 65 57.5 15 3 -b =0.2
Find the number of children in the group. (L)
................---------------------------------------------------
Frequency density Frequency density
Frequency polygon to show times taken to complete a cross-country run
Mid-interval value College A College B
~
c 22.5 0.8 0

~ 3
~
u
27.5 1.2 0.4
c
•u
0
32.5 2.2 0.8
.t 37.5 2.8 1.4
42.5 1.8 2.2
47.5 1 2.4
2 52.5 1 2.2
57.5 0.6 1.6
62.5 0 l
67.5 0 0

"' 3 - - CollegeA
~

~
~==·CollegeS

6c

0

l ""'""'
x~
',
45 50 55 60 65 ~ ><\
25 30 35 40
Time (min) 2 I
\

Note that this distribution is skewed with a tail at the right hand end, i.e. it is positively '\
skewed. ~
You could of course construct the histogram first and then join the mid-points of the tops of I
' \
\
I \
the rectangles to give the frequency polygon. I

I
I 'X \
I
I
I
Comparative frequency polygons I
I

Frequency polygons are very useful when comparing sets of data.

Example 1.11 Age (years)


Draw frequency polygons to compare the age distribution of the teachers in two sixth form
The bulk of the distribution for College A is further to the left than College B. This indicates
colleges: that College A has a much younger staff than College B.

25- 30- 35- 40- 45- 50- 55- 60- 65- Notice that in this example, since all the intervals are of equal width, frequency could have
Age 20-
been used ort the vertical axis.
11 14 9 5 5 3 0 0
College A 4 6
4 7 11 12 11 8 5 0
College B 0 2

FREQUENCY CURVES
Solution 1.11
Work out the mid-interval value for each interval, for example in the interval '20-' the lower When the number of intervals is large the frequency polygon
boundary is 20 and the upper boundary is 25, so mid-interval value~! (20 + 25) ~ 22.5 consists of a large number of line segments. The frequency
polygon approaches a smooth curve, known as a frequency
The width of each interval is 5, so work out the frequency densities for each college by curve.
dividing the frequencies by 5.
21

(d) Uniform or rectangular


The shape of a distribution
i
I I I
II
i
If distributions represented by a vertical line graph or a histogram are illustrated using a i
frequency curve, it is easier to see the general 'shape' of the distribution. For example:

A positively skewed distribution could occur when considering, for example,


i

I
" the number of children in a family, I
• the age at which women marry,
In a uniform or rectangular distribution tbe data are evenly spread throughout the range.
• the distribution of wages in a firm.
(e) The normal distribution
(a) Positive skew

\
~

/
,- f-
1-
' \
rl
'
f-

ih ""'
J \ .

'·"
This symmetrical, bell-shaped distribution is known as a normal distribution.
In a positively skewed distribution, there is a long tail at the positive end of the
distribution. An approximately normal distribution occurs when measuring quantities such as heights,
masses, examination marks.
(b) Negative skew
A negatively skewed distribution could occur when considering, for example,
!_J
e reaction times for an experiment, I i
o daily maximum temperatures for a month in the summer.
I. A researcher timed how long it took for each of 3. On a particular day the length of stay of each car
38 volunteers to perform a simple task. The at a city car park was recorded:
results are shown in the table.
Length of stay (min) Frequency
Time (seconds) 5- 10- 20- 25- 40- 45-
t< 25 62
Frequency 2 12 7 15 2 0 25<;t<60 70
60<1< 80 88
Draw a histogram to illustrate the data. 80<!<150 280
2. In a survey the masses of 50 apples were noted 150" t < 300 30
In a negatively skewed distribution, there is a long tail at tbe negative end of tbe
and recorded in the following table. Each value
distribution. was given to the nearest gram. Represent the data by a histogram and state the
modal class.
86 101 114 118 87 92 93 116
(c) Reverse }-shape 105 102 97 93 101 111 96 117 4. Draw a histogram to show the masses, measured
100 106 118 101 107 96 101 102 to the nearest kilogram, of 200 girls.
104 92 99 107 98 105 113 100
I 103 108 92 109 95 100 103 110 Mass (kg) 41-50 51-55 56-60 61-70 71-75
II I 13
108
99 106 116 101 105 86 88
92
I Ii Ii
I I
(a) Construct a. frequency distribution, using
equal class mtervals of width 5 g and taking
Frequency 21 62 55 50 12

II II
I ' I
the first interval as 85-89.
(b) Draw a histogram to illustrate the data and
write down the modal class.
(c) Dr~1w a stemplot to illustrate the data and
In a }-shaped (reverse) distribution an initial 'bulge' is followed by a long tail.
wnre down the mode.
20% solution
5. This histogram represents the speeds of cars 9. The table shows the ages, in completed years, of Length (em) Frequency
passing a 30 miles per hour sign. Write out the women who gave birth to a child at Anytown
Height (em) Frequency
frequency distribution. Maternity Hospital during a particular year. O<;x<4 2
Without drawing a histogram first, draw a
frequency polygon to illustrate the information.
4,;;x<8 so 0
Describe the distribution.
8<;x<12 51 1
12<;x<16 52 0
Age (years) Number of births 16<;x<18 53 2
,-- 16- 70 18<;x<20 54 5
20- 470 20 <;X< 30 55 9
5
535 56 17 'I
4 1-- 25- I
280 13. Lucy and Jack play a computer game every day 57 25
30-
3
35- 118
and keep a record of their scores. Lucy's scores 58 20 ii
are shown in the table. Draw a frequency I
2 f-- 0 59 12
45- polygon to represent her scores. 1.:
60 9 \!

0 20 24 30 38 48 60 10. The patients at a chest clinic were asked to keep Lucy's 61 0 I


32 Speed (mph)
a record of the number of cigarettes they smoked
each day.
scores

Frequency
50-99 100-149 150-199 200-249 250-299

6 14 10 6 4
40% solution I
6. In a competition to grow the tallest hollyhock, Number of cigarettes Height (em) Frequency
the heights recorded by 50 primary school Frequency Jack's scores are as follows:
smoked per day 54 0
children were as follows. Heights were measured
to the nearest centimetre. 0-9 5 Jack's 55 2
10-14 8 scores 50-99 100-149 150-199 200-249 250-299 56 2
Height (em) Frequency
15-19 32 57 2
l~requency 2 6 10 16 6
177-186 12 20-29 41 58 7
187-191 8 16 Draw a frequency polygon for Jack's scores on 59 10
30-39
192-196 8 40 and over 2 the same set of axes as Lucy's and use it to 60 11
197-201 9 compare the two sets of scores. 18
61
202-206 7 Draw a histogram to represent this data. 62 18
14. Students were investigating the effects of a
207-216 6 growth hormone placed on the growing tip of a 63 16
11. The marks awarded to 136 students in an
examination arc summarised in the table. Draw a maize seedling. The hormone was used in two 64 9
Draw a histogram and superimpose a frequency different concentrations and distilled water was
histogram to illustrate the data. 65 5
polygon. used as a control on a third set of seedlings. After
three weeks the heights of the plants were 66 0
Marks Frequency
7. The table shows the duration, in minutes, of measured to the nearest centimetre. They are
64 telephone calls made from a High Street call 10-29 22 shown in the table. Draw frequency polygons to 15. In one month, a stUdent recorded the length, to
box in a day. 18 represent the data and compare the results. the nearest minute, of each of the lectures she
30-39
22 Control attended. The table below shows her data and
Length of call (min) Frequency 40-49
the calculations she made before drawing a
S0-59 24 Height (em) Frequency histogram to illustrate these data.
0- 3 60-64 14
1!- 7 65-69 12 45 0 Length of
3- 22 70-84 24 46 7 lecture (minutes) 50-53 54-55 56-59 60-67
6- 20 47 11
6 48 12 Number of
12- 12.
6 49 14 lectures a b 30 c
15- "' 3
-~ 2.5
21- 0 u
g-
2
~ 1.5
-l
-_1·_-_-i__
50
51
14
18
Frequency
density 5 13 7.5 1.5
Draw a frequency polygon to illustrate the data. l l
1=1
52 12
0.5 53 8 Calculate
8. These are the number of times the letter 'e' (a) the value of a, of band of c,
appears in each sentence in an article called 'My 0246810Ul41618WUM~M~ 54 3 (b) the total number of lectures attended during
Kind of Day'. Make a grouped frequency Length (em) 55 1 the month. (C Additional)
distribution and draw a histogram. 56 0
Complete the frequency distribution represented
15 12 8 12 3 10 14 17 5 3 8 11 by the frequency polygon above.
7 16 5 13 12 11 6 7 4 17 8 1
.........--------------------------------------------- 25

Solution 1.13
CIRCULAR DIAGRAMS OR PIE CHARTS First calculate the total sales for each year and the angles in the pie charts.
Pie charts are so called because they look like an apple pie! The areas of the slices or sectors of Total sales (in millions of pounds):
the pie are in proportion to the quantities being represented.
First year F1 = 5.5 + 6.7 + 13.2 + 19.6 = 45
Second year F2 = 5.8 + 15.2 + 9.2 + 29.8 = 60
Example 1.12
The pie chart, which is not drawn to scale, shows the Angles:
distribution of various types of land and water in a certain
county. Calculate America Asia Europe
Africa
(a) the area of woodland, 13.2
4s x 360° ~ 105.6" Total 360"
(b) tbe angle of the urban sector, Farmland First year
(C) 1200,krri2
(c) the total area of the county.

Solution 1.12
1
f~~ x 88 = 660 km
2
(a) 160" represents 1200km 2 , . • 88" represents
2 Work out the ratio of the radii using
Area of woodland = 660 km
(b) 1200 km 2 is represented by 160", 30 km 2 is represented by 1\'~0 x 30 = 4"
d: d = F1 : F2 = 45 : 60 = 3 : 4
Angle for the urban sector = 4" r 1 : r 2 = 'f3 : 14 = 1. 73 · · · : 2

1f6°~ x 360 = 2700 km


2
(c) 160" represents 1200 km 2 , 360" represents So you could taker 1 = 1.7 em, r 2 = 2 em, or multiples of these e.g. r 1 = 3.4 em, r 2 = 4 em.
2
Total area of county= 2700 km Sales in second year
Sales in first year

Comparison pie charts Africa

Pie charts of different sizes are useful when comparing two or more populations. The area of Europe
America

each pie will be in proportion to the different population sizes, so if the pies are drawn with America
Europe
radii r and r 2 and represent total population sizes F1 and F2 , then
1 Asia

nri: nri = F1: F2 . Asia

Dividing by n ri: ri = F1 : F2
Taking square roots r1: r2 = {F,:{F,
r,
tz,ulii ::.,houid be chosen so that
Example 1.14

Example 1.13 On a particular Wednesday the sales of sugar from a supermarket consisted of 250 large
. ' • s, 210 me d"mrn pac1cets an d 225 small packets. The mass of sugar in a large packet is
p·tcket
The table shows, in millions of pounds, the sales of a company in two successive years. 111 tunes that in a medium packet and 2i times that in a small packet. Calculate the angles
needed
. l to dr aw a p1e · chart representing
· t he tota 1masses of sugar sold 111
· large, medmm
· and
Year Africa America Asia Europe sma 1 packets.

First 5.5 6.7 13.2 19.6


The radius of the correspon d"mg p1e · c h art for t h e f o II owmg
· Saturday's sales of sugar was
5.8 15.2 9.2 29.8 I. ·kll>e
Second 'ou , t h 'at for. th e W ed nes d ay ,s sales. On the Saturday 900 large packets and 900 medium
\' .rc cts· were sold · C a lcu Iate t h e number of small packets sold on the Saturday. (C)
Draw two pie charts which allow the total annual sales to be compared.
These data are to be represented by a pie chart of 5. A golf club has four categories of membership:
radius 5 em. Calculate, to the nearest degree, the men, women, juniors and social members. The
Solution 1.14 angle corresponding to each of the five pie chart shown, which is not drawn to scale,
classifications. (Do not draw the pie chart.) illustrates the distribution of membership in
Let the mass of a small packet be x. 1995. Given that there were 147 men and 35
Then the mass of a large packet is 2!x. The following year the county council spent social members, calculate
£305.2 m. (a) the number of junior members,
Also, you are given that (b) the angle of the sector representing the
Find the radius of a comparable pie chart which
mass of a large packet= 1! x mass of medium packet could be used to represent this second set of social members,
so 2! x = 1! x mass of medium packet data. (L) (c) the number of women.
21x
. . mass of a medium packet=~= i x. 3. Five companies form a group. The sales of each
Women
1z company during the year ending 5 April, 1988,
are shown in the table below.
Mass of 225 small packets = 225x
Mass of 210 medium packets= 210 x ~x = 350x Company A B C D E
Mass of 250 large packets = 250 x ~x = 625x 35 60
Sales (in £1000s) 55 130 20
. . total mass= 1200x Juniors
225 Draw a pie chart of radius 5 em to illustrate
Angle representing mass of small packets = x 360" = 67.5" this information. Social
1200
The corresponding pie chart for 2000 indicated
350 For the year ending 5 April, 1989, the total sales
that the number of men had increased by 49
Angle representing mass of medium packets= x 360" = 105" of the group increased by 20%, and this growth
1200 was maintained for the year ending 5 April,
although the angle of the corresponding sector
625 remained the same. Calculate the total number of
Angle representing mass of large packets = x 360" = 187.5" 1990. members in 2000.
1200 If pie charts were drawn to compare the total Given that the radius of the 1995 pie chart was
sales for each of these years with the total sales 26 em, calculate the radius of the 2000 pie chart.
Let Fw denote total number of packets sold on Wednesday. for the year ending 5 April, 1988, what would be (C Additional)
Let Fs denote total number of packets sold on Saturday. the radius of each of these pie charts?
6. During a particular fortnight a family spends
Then Fw = 250 + 210 + 225 = 685. If the sales of company E for the year ending
£52.27 on meat, £23.10 on fruit and vegetables,
5th April, 1990, were again £60 000, what
Also r 5 :rw=2: 1 would be the angle of the sector representing £19.72 on drink, £12.41 on toiletries, £102.68
F5 :Fw=r~:rt,=4: 1 them? (C)
on groceries and £9.82 on miscellaneous items.
F,=4 X 685 =2740 These data are to be represented by a pie chart of
4. A charity obtains its income from various radius 5 em.
Number of small packets sold on Saturday= 2740- (900 + 900) sources. The table below shows these sources (a) Calculate, to the nearest degree, the angle
= 940 and the corresponding amounts of income for corresponding to each of the above
940 small packets were sold on Saturday. 1993. classifications. (Do not draw the pie chart.)
The following fortnight the family spends 20%
Source Income(£) more in total.
{b) Find the radius of a comparable pie chart to
Advertising 30 000 represent the dat\'1- on this occasion. (L)
Exercise 1c Pie charts Donations X
There are 34 pupils in Shumilla's class. For these 7. Pie charts A, Band Care drawn to compare,
1. There are 28 pupils in Peter's class. He carried Fees 9 000 over a given period, the total value of the sales of
pupils she carried out the same kind of survey
out a survey of how the pupils in his class Investments 3 000 certain items in each of three branches of a
and drew a pie chart to show her results.
travelled to school. His results are shown in the Sponsorship 10 000 multiple store. The radii of the charts are 20 em,
table below. (c) Calculate, giving your answer to three 30 em and 40 em, respectively.
significant figures, the radius of a comparable A pie chart was drawn to illustrate the data. (a) If the total sales value represented by chart
Method of travel Number of pupils pie chart which could be used to represent Given .that the angle of the sector representing B is £4500, calculate the total sales value
the results of Shumilla's survey. Donatlons was 204°, calculate represented by each of charts A and C.
Bus 12 (a) the total income for 1993 (b) The angle of the sector representing a
2 2. The following data summarise the expenditure {b) the value of x ' particular item in chart A is 72°. Calculate
Car by a county council during a particular year.
5 (c) the angle of e;ch of the remaining sectors. the sales value of this item.
Bicycle
{c) The sales value represented by a sector in
Walking 9 Expenditure 1: ~~~ond p~e chart was drawn to compare the
Service chart C is £600. Calculate the angle of the
couespondmg 1996 data with that of 1993 In sector.
The data are to be illustrated by a pie chart. Education 160.2 1996 t I1e ·mcome from Sponsorship had increased
·
(d) One item occupies one quarter of chart A,
(a) Calculate, to the nearest degree, the sector Highways & Public Transport 35.7 tof £28 800 an d t h'IS was represented by a sector and the sales value for this item is one half
angles of the pie chart. 28.9 (~ angl~ GOo in the pie chart for 1996. Given that of that for the same item on chart B.
(b) Draw the pie chart using a circle of radius Police ~.te .radms of the 1996 pie chart was 9 em,
27.9 Calculate the angle of the sector for this
5 em, labelling each sector with the method Social Services ,llculate the radius of the 1993 pie chart. (C) item on chart B. (C Additional)
of travel it represents. Other 24.5
~-----------------------------------------------

For five tests, Ben wants his mean mark to be at least 70.
8. On a certain day, 125 people, each buying one 9. A householder keeps an annual account of four
newspaper, were asked which newspaper they items of expenditure. The figures for the year x1 + Xz + x3 + x4 + Xs ~ 70
had bought. The results of the survey are shown 1991 are shown in the table below. 5
in the table below. 272 + Xc
Item Expenditure (£) --~')70

Number bought
5
Newspaper
Taxes X 272 +x 5 ;;, 350
The Times 10 Travel 1000 x 5 ;;,350-272
The Telegraph 25 Light/Heat y x 5 ~ 78
The Express 40 Telephone 300 To obtain Grade A, Ben must get at least 78 marks in his fifth test.
Some other paper 50
A pie chart was drawn to illustrate these data.
Given that the angles of the sectors representing 4
Calculate the angles of the sectors of a pie chart
of radius 5 em which would illustrate these data. Taxes and Travel were 124° and 80°
respectively, calculate A shorthand way of writing x 1 + x 2 + x 3 + x 4 is I
i=l
xi.
The following day a similar survey was carried (a) the total expenditure for the year,
out and the radius of the pie chart necessary to (b) the value of x and of y,
compare the new set of data with the previous
The symbol L (the Greek capital letter 'sigma') is used to denote 'the sum of'. So for
(c) the angle of each of the remaining sectors.
set was 6 em. Calculate the number of people in
the second survey. (C Additional) In 1992, the total expenditure on the same items x 1 + x 2 + x 3 + ... +X 11 you could write I" xi.
x=l
was £8000. Given that the radius of the pie chart
"
for 1991 was 6 em, calculate the radius of the pie
chart for 1992 in order that the two sets of data . - + Xz + ... + Xn i=l
x1
LX;
may be compared. (C Additional) The mean 1s often denoted by x, so
nx~ ~ --
n
This is rather cumbersome, so usually the subscript i is omitted.

THE MEAN
A typical or average value is useful when interpreting data. One such average is the mean.

Consider the five numbers Example 1.16


0.9, 1.4, 2.8, 3.1, 5.6. The members of an orchestra were asked how many instruments each could play. Here are
their results.
0.9 + 1.4 + 2.8 + 3.1 + 5.6 13.8
The mean is - - - - - - - - - - ~ -- ~ 2.76 2 5 2 4 1 1 1 2 1 3
5 5 3 2 1 2 1 1 2 4 3 2
1 2 3 1 4 2 3 1 1 2
Find the mean number of instruments played.

Solution 1.16

Example 1.15 n = 30, Lx = 2 + 5 + 2 + .. · + 1 + 2 ~ 63


_ Lx 63
To obtain Grade A, Ben must achieve an average of at least 70 in five tests. If his average X=-=-= 2.1
mark for the first four tests is 68, what is the lowest mark he can get in his fifth test and still 11 30
obtain Grade A? The mean number of instruments played is 2.1.
~--

In the above example, the data could have been arranged in a frequency distribution:
Solution 1.15
Xt +xz+X3+x4 Number of instruments, x 1 2 3 4 5
For the first four tests, ~ 68
4 Frequency, f 11 10 5 3 1
30 t\ CONCISE: COUf\S[ i~-1 /'.-1 ['/F_-1_ ST.t.TIST!CS

The total number of instruments played can be calculated in an organised way as follows: Find the other mid-interval values and form a table:

Mid-interval
X f fxx total number of instruments fx
x Speed (m.p.h.) value, x f
_ 'Efx
1 11 11 total number of people x~--

21-25 23 22 506 I.f


2 10 20 I.fx
26-30 28 48 1344 3800
3 5 15 'Ef 25 825
31-35 33 120
4 3 12 63
36-45 40.5 16 648 ~ 31~
5 1 5 30 477
46-60 53 9
Ef~30 Efx~ 63 ~ 2.1
Ef~ 120 Efx~3800

The mean number of instruments played is 2.1


T T The mean speed was 31 ~ m.p.h.
total number total number
of people of instruments
played
Using the calculator to find the mean
Note that 'Efx is sometimes written 'Exf and remember that x and fare multiplied.
You can use your calculator in ordinary computation mode to calculate the total and also do
the division. It is more useful, however, to work in the statistical mode, known as SD or STAT
ln gencLll, for cbta in an ungroupcd frequcJJC)' d_isnib-ut·iun
mode. Your calculator may operate as in one of the examples below. If yours does not appear
:S{x to follow one of the patterns, you will need to consult your calculator manual.
X--"" Notice that once you put in the data you have access not only to the value for the mean, but
'L/
also to n and I.x.
When the data have been grouped into intervals, the actual values of the readings are not
known. You can only make an estimate of the mean. To do this, take the mid-interval value as
Example 1.18
representative of the interval.
Find the mean of the numbers 33, 28, 26, 35, 38.
Remember that mid-interval value~ t (lower class boundary+ upper class boundary)
Solution 1.18
Casio 570W/85W/85WA Sharp
Example 1.17
Set SD mode \MODE\\MODE\ C!J or \MODE\ !II \MODE!C!J
The speeds, to the nearest mile per hour, of 120 vehicles passing a check point were recorded
and are grouped in the table below. Clear memories \SHIFT!~G \2ndF\\CA\

Input data !IlJIDTI (Til \DATA\


Speed (m.p.h.) 21-25 26-30 31-35 36-45
~IDTI [I[) \DATA\
Number of vehicles 22 48 25 16 9
~IDTI ~\DATA\

Q]!DT\ [Til \DATA\


Estimate the mean of this distribution. (C Additional)
ll§] IDTI il§] \DATA\

To obtain
Solution 1.17 >>32 \SHIFT![I] G \2ndF! m
Work out the mid-interval value for the first interval21-25, using lower class boundary~ 20.5, n~s \RCL\ [9 Red kttcrs on third \2nd F\1]
upper class boundary~ 25.5. l:x ~ 160 \RCL\ I!] I ro¥--' of calculator \2ndF\@

So mid-interval value~~ (20.5 + 25.5) ~ 23. To clear \MODE\ [I] \MODE\ [QJ
SD mode
You then assume that all the values in the interval21-25 are in fact 23.
From the calculator, the mean is 32.
The diagram shows a histogram of the distribution of masses of 50 first-year University
Example 1.19 students. All the rectangles are there but the vertical axis has been torn off.
Find the mean number of children per family for the following frequency distribution.
(a) Compile a grouped frequency table for the distribution.
4 5 (b) Use the values in your frequency table to find an approximate value for the mean mass of
Number of children per family, x 1 2 3
the students.
3 4 8 2 3
Frequency, f
I'
Solution 1.20
Solution 1.19 Let one small square be h on the vertical axis.
Casio 570W/85W/85WA Sharp
Remember that in a histogram, the area of each rectangle is proportional to the frequency.
Set SD mode IMODEIIMODEI OJ or IMODEI [I] IMODEIOJ
The areas are
Clear memories ISHIFT I !}ill B I2nd FIICAI

Input data OJ ISHIFT I o rn IDTI OJ~[IJ['5ATA] 5h X 10, 10h X 10, 18h x 5, 22h X 5, 10h
i.e. SOh, lOOh, 90h, 110h, 150h.
X 15

in the order
[I] ISHIFT I [J [l] IDTI [I]~[l]['5ATA]
So the total area~ SOOh.
xxf
rn lsHIFTI o rn IDTI rn ~ rn IDATAl
But total frequency~ the number of students~ 50
[l] ISHIFTI [J [I] IDTI [l] Gl [I] IDATAI
~50
mIsHIFT I o rn IDTI CIJ GJ [I) \DATA\ 500h
h ~ 0.1
To obtain This means that the frequencies are 5, 10, 9, 11, 15, giving a total of 50.
x~2.9 ISHIFT I OJ B I2nd Fl []

Lf~20 IRCLI [9 Red lcttcn on third I I2nd Fl [I


(a) The frequency distribution is

f
Lfx~ss IRCLI rnJ I rovv of t..'aku!ator I2nd Fl [±] Mass (kg) Frequency,

IMODEI@J 40 <: m <50 5


To clear IMODEIOJ
SD mode 50.;; m < 60 10
60 <: m < 65 9
From the calculator, the mean is 2.9 children per family. 65 <: m < 70 11
Make sure that you input the data in the order x x f. Remember that x usually comes first in 70 <: m < 85 15
the frequency table.
(b) Take the mid-point of each interval to represent that interval. For example, the mid-point

Example 1.20 of the interval 60 <: m < 65 is! (60 + 65) ~ 62.5.

mid-point, x frequency, (, fx
_ Lfx
45 5 225 1.e. x~--

55 10 550 Lf
62.5 9 562.5 3242.5
67.5 11 742.5 50
77.5 15 1162.5 ~ 64.85 kg
Lf~50 Lfx ~ 3242.5

The mean mass is 64.85 kg.

40
Mass in kg
4. The amounts spent by 120 motorists at a petrol (a) A student was asked to draw a histogram to
Using the calculator: illustrate the data and produced the
station were recorded.
following diagram.
Casio 570W/85W/85WA Sharp Amount spent, £x Number of motorists
A histogram to illustrate the heights of birch trees
Set SD Mode IMODEIIMODEI [I) or IMODEIIIJ !MODEl [I) x<5 12
5 <x < 10 38
Clear memories ISHJFTIISc!l B l2ndFIICAI ill 20
10<;x<15 42 1: ,-
0
~[><][I] IDATAl
Input Data ~ llli1ITl 0 [I] IDTI 15 <;X< 20
20<;x<40
20
8
•§
~
15
.--
,-
in the order
ITilllli1B'l 0 [I_QJ IDT I lTil [><] [I_QJ IDATA I z
xxf (a) Draw a histogram to represent the data. 10
I62.5IISHIFTI 0 [2] IDTI 162.51 [><] [2] IDATAI (b) Estimate the mean amount spent.

l67.5lllli1IT] 0 [j] IDTI !67.51 [)<] [j] !DATAl 5

l77.5lllli1IT] 0 [J] IDTI 177.51 [)<] [l] !DATAl


5. The age distribution of the population of a
small village is recorded in the table below.

Age (years) Number of people


0
0 5 10
n
15 20 25 30
To obtain Height (m)
0- 54
x~ 64.85 llli1ITl ITl B l2ndFI []]
Practise this 15- 78 Give two critical comments on this attempt
120 at a histogram.
r, f ~50 IRCLI [9 :Red [etten, on third I2nd Fl [J yourself. Make sure
30- (b) Using graph paper, draw a correct
50- 88 histogram to illustrate the above data.
'£ fx ~ 3242.5 ~[[] ~[i] that you are familiar
70- 60 {c) Calculate an estimate of the mean height of
with the method on 0 the birch trees, giving your answer correct to
!MODEl@] 100-
To clear !MODEl [I) your calculator three significant figures. (C)
SDMode Draw, on graph paper, a histogram to represent
8. Telephone calls arriving at a switchboard are
these data. answered by the telephonist. The following table
Estimate the mean of this distribution. shows the time, to the nearest second, recorded
(C Additional) as being taken by the telephonist to answer the
calls received during one day.
6. Find the mean length for the data represented by
the stem and leaf diagram. Time to answer
Exercise ld The mean .-------
1. Find the mean of each of the following sets of 2. A sample of 100 boxes of matches was taken and
Key 15]1 means 16 em I (to nearest second) Number of calls

10-19 20
numbers, a record made of the number of matches per
box. The results were as follows: Stem Leaf 20-24 20
(i) not using SD mode, 12 0 0
(ii) using SD mode. 25-29 15
15 () 1 1
Compare your answers. Number of 18 1 1 2 30 14
{a) 5, 6, 6, 8, 8, 9, 11, 13, 14,17 matches per box 47 48 49 50 51 21 0 1 1 2 2 2 31-34 16
24 0 0 1 2 35-39 10
(b) 148, 153, 156, 157, 160 Frequency 4 20 35 24 17 27 1 1
40-59 10
{c) 44!, 471, 48!, stt, 521, 54±, sst, S6i 30 2
Calculate the mean number of matches per box.
(d) 1769,1771,1772,1775,1778,1781,1784 (a) Represent these data by a histogram.
7. The height, correct to the nearest metre, was
(e) 0.85, 0.88, 0.89, 0.93, 0.94, 0.96 3. On a certain day the numbers of books on 40 Give a reason to justify the use of a
recorded for each of the 59 birch trees in an area
shelves in a library were noted and grouped as histogram to represent these data.
(f) of woodland. The heights are summarised in the
1 2 3 4 5 6 7 (b) Calculate an estimate of the mean time
shown. Find the mean number of books on a following table. (L)
taken to answer the calls.
shelf.
4 5 8 10 17 5 1
Number of shelves Height(m) 5-9 10-12 13-15 16-18 19-28
Number of books
(g) 28 29 30 31 32
X 27 Number of
31-35 4
35 trees 14 18 15 4 8
f 30 43 51 49 42 36-40 6
41-45 10
(h) 121 122 123 124 125
X 46-50 13
14 25 32 23 6 51-55 5
f
56-60 2
VARIABILITY Of DATA
Weighted means
Each of these sets of numbers has a mean of 7 but the spread of each is set is different:
In some situations it may not be suitable to calculate an ordinary mean. There may be times
when you wish to place greater emphasis on some of the values, as illustrated in the following (a) 7, 7, 7, 7, 7
example. (b) 4, 6, 6.5, 7.2, 11.3
(c) -193,-46,28,69, 177
Example 1.21 There is no variability in set (a), but the numbers in set (c) are obviously much more spread
A candidate obtained the following results in her GCSE mathematics examination: out than those in set (b).
Paper 1:72%, Paper 2:64%, Coursework: 73% There are various ways of measuring the variability or spread of a distribution, two of which
The regulations state that the two written papers have equal weighting and count for 80% of are described here.
the final result, whereas the coursework counts for 20%. What was the candidate's final
mark?
The range
Solution 1.21
The range is based entirely on the extreme values of the distribution.
The results are in the following ratio:
40% : 40% : 20% ~ 4 : 4 : 2 ~ 2 : 2 : 1.
For the final result, you have to take this weighting into account: In (a) the range~ 7- 7 ~ 0
In (b) the range~ 11.3-4 ~ 7.3
weighted mean~ 2(72) + 2(64) + 1(73) ~ 345 ~ 69 In (c) the range~ 177- (-193) ~ 370
2+2+1 5
Note that there are also ranges based on particular observations within the data and these
Therefore the final mark is 69%. percentile and quartile ranges are considered on page 68.
In if xJ, x 2 , .. ,, ~Y 1 , m\: wl" ,,,, IU 11 thc11

t.U 1 y:, +· W;X-/ U.i,.X 2


If/;-;- IU-, U)
f/
THE STANDARD DEVIATION, s, AND THE VARIANCE, s
The standard deviation, s, is a very important and useful measure of spread. It gives a measure !

of the deviations of the readings from the mean, x. It is calculated using all the values in the !I
means distribution. To calculate s:
1. Find the weighted mean of the numbers 8 and 3. The prices of articles A, Band Care £30, £42
and £65. Find the mean price, if the three articles • for each reading x, calculate x- X, its deviation from the mean,
12, if they are given the weights 2 and 3 • square this deviation to give (x- X) 2 and note that, irrespective of whether the deviation
respectively. are given weights of 5, 3 and 2 respectively.
was positive or negative, this is now positive,
2. The final mark allocated to a student is 4. The weighted mean of the two numbers 30 and • find r(x- x) 2 , the sum of all these values,
calculated from her mark in each subject. 15 is 20. If the weightings arc 2 and x
(a) The class teacher worked out an ordinary respectively, find x. • find the average by dividing the sum by n, the number of readings;
mean. . . r(x-x) 2 •
(b) The headteacher decided to weight the 5. Two students, Jack and Jill, take an examination thts gtves and is known as the vartance,
subjects in proportion to the number of in French, German and English. The table below n
lessons per week, as shown in the table. shows the marks for each student and the weigbt • finally take the positive square root of the variance to obtain the standard deviation, s.
to be applied to each subject.
Number of lessons :·:, 01 a
Subject French German
Subject Mark per week
I
5 Marks for Jack 80 72 46 .I
Mathematics 64%
English 52% 4 Marks for Jill 64 82 Each of the three sets of numbers on the previous page has mean 7, i.e. X = 7.
Science 71% 6 Weight 2 X

3
Ia) For the set 7 7 7 7 7
French 75%
Calculate the value of x for which Jack and Jill ' ' ' '
History 82% 2 have the same weighted mean mark and find Since x- X = 7- 7 = 0 for every reading, s = 0, indicating that there is no deviation from
value of this mean. (C · the mean.
Which method gave the higher mark and by how
much?
To calculates, put the data into a table:
(b) For the set 4, 6, 6.5, 7.2, 11.3
Machine A Machine B
L(x- x) 2 ~ (4- 7) 2 + (6- 7) 2 + (6.5 -7) 2 + (7.2- 7) 2 + (11.3- 7) 2 ~ 28.78

~
s ~ ~-----;;-- ~
ps.n
-5-~2.4 (1 d.p.)
X

196
x-200
-4
(x- 200) 2

16
X

192
x-200
-8
(x- 200) 2

64
36
198 -2 4 194 -6
198 -2 4 195 -5 25
(c) For the set -193,-46,28,69,177
-1 1 198 -2 4
199
L(x- x) 2 ~ (-193- 7) 2 + (-46- 7) 2 + (28- 7) 2 + (69- 7) 2 + (177- 7) ~ 75 994
2
0 0 200 0 0
200
1
~
s ~ ~-----;;-- ~
t5994
5 ~ 123.3 (1 d.p.)
200 0 0 201 1
9
201 1 1 203 3
201 1 1 204 4 16
Notice that set (c) has a much higher standard deviation than set (b), confirming that it is 2 4 206 6 36
202
much more spread about the mean. 5 25 207 7 49
205
Remember that 56 240

L(x -200) 2 li
Standard dc\'iation L(x- 200) 2
52:::: s' 10
'

Variance 10
~ 5.6 ~24
,,i'
s~m
'
NOTE: s ~iVi
"' The standard deviation gives an indication of the lowest and highest values of the data as ~ 2.37 (2 d.p.) ~ 4.90 (2 d.p.)
follows. In most distributions, the bulk of the distribution lies within two standard
Machine A: s.d. ~ 2.37 g (2 d.p.) Machine B: s.d. ~ 4.90 g (2 d.p.)
deviations of the mean, i.e. within the interval x ± 2s or (x- 2s, x + 2s). This helps to give
Machine A has less variation, indicating that it is more reliable than machine B.
an idea of the spread of the data.
"' The units of standard deviation are the same as the units of the data.
" Standard deviations are useful when comparing sets of data; the higher the standard
deviation, the greater the variability in the data.
Alternative form of the formula for standard deviation
The formula given above is sometimes difficult to use, especially when X is not an integer, so
Example 1.22 an alternative form is often used. This is derived as follows:
Two machines, A and B, are used to pack biscuits. A random sample of ten packets was taken 2 1 2
frmn each machine and the mass of each packet was measured to the nearest gram and noted. s ~- E(x -x)
n
Find the standard deviation of the masses of the packets taken in the sample from each 1
machine. Comment on your answer. ~- L(x 2 - 2xx + x2)
n
1
Machine A ~- (Ex 2 - 2xLx + Ex 2 )
n
(mass in g) 196,198,198,199,200,200,201,201,202,205 :Ex 2 :Ex nX 2
~---2x-+-
Machine B n n n
(mass in g) 192,194,195,198,200,201,203,204,206,207 Lx' LX
~ - - - 2x(x) + x 2 since -=X
n n
Ex 2
~---.x2
Solution 1.22 12

LX 2000 LX 2000
Machine A x ~- ~ - - ~ 200 Machine B x~ - ~ -- ~ 200
n 10 n 10
II
Since the mean mass for each 1nachine is 200, x- X = x- 200
l ,,
iT

l:x 2
NOTE: It is useful to remember tbat - - - x 2 can be thought of as I.
n
'the mean of the squares minus the square of the mean'.
or ir1 the ahcmmivc form
Example 1.23
The mean of the five numbers 2, 3, 5, 6, 8 is 4.8. Calculate tbe standard deviation.

Solution 1.23

S .~
2 Consider again the data given in Example 1.19, on page 32, which shows the number of
Method 1 using )L(xn-x) Method 2 using s ~ )Lnx'- x-z
children in 20 families. The mean is 2.9.

X x-x (x-x)' x x2 Number of children per family, x 1 2 3 4 5

2 -2.8 7.84 2 4 3 4 8 2 3
Frequency, f
3 -1.8 3.24 3 9
5 0.2 0.04 5 25 You could use one of these tbree methods for finding the standard deviation. Method 2 is
6 1.2 1.44 6 36
8 3.2 10.24 8 64 more popular than Method 1.

22.80 138 Method 1- using s ~ )Lf(x- x)'


Lf
s
2 22.80
~-- s2 ~ 138
5 - (4 .8)
2

5 x-2.9 (x- 2.9) 2 f f(x- 2.9) 2


X
~4.56 ~4.56
1 -1.9 3.61 3 10.83
s ~ ~4.56 s ~ ~4.56 2 -0.9 0.81 4 3.24
~ 2.14 (2 d.p.) ~ 2.14 (2 d.p.) 0.1 0.01 8 0.08
3
1.1 1.21 2 2.42
The working for method 2 is less involved. 4
2.1 4.41 3 13.23
5
2:{~20 2:f(x-x) 1 ~ 29.80
Using the calculator to find the standard deviation
The standard deviation can be found directly using the calculator in SD mode. The numbers Lf(x-2.9)"
s'
are entered in the same way as when you are finding the mean. Lf
To find the standard deviation of the five numbers 2, 3, 5, 6, 8 used in Example 1.23: 29.80
20
Casio 570W/85W/85WA Sharp ~ 1.49
Set SD mode IMODEIIMODEI [I] or IMODEl [I) IMODEI [I] s ~ ~1.49
Clear memories ISHIFTIISc!l B l2nd PliCAl ~ 1.22 (2 d.p.)

Input data [I)IDTI [I) IDATAI The standard deviation of the number of children per family is 1.22 (2 d.p.).
llJIDTI llJIDATAI
2
[I] IDTI [I] IDATAI Method 2 - using s ~ )L!xf
~ - . x_ 2
IIJIDTI IIJIDATAI
ITJIDTI rn IDATAl X f x' fx'
To obtain 1 3
s ~ 2.135 ... ISHIFT! [I) B I2nd Fl EJ 1 3
4 16
2 4
You can check 72
3 8 9
x~4.8 ISHIFT I [JJ B I2nd Fl [J
2:x~24
2:x2 ~ 138
IRCLI[ID
IRCLI~
IRCLI [g I
Red he!"'"' e~; 1i1Id
CU\Y u! ,,_';-LLi.ihltor
l I2nd Fl
l2ndFI Q
EJ
I2nd Fl [I]
4
.5
2
3
2:{~20
16
25
2:fx 2
32
75
~ 198
n 5
To clear IMODEI[l] IMODEI [Q]
SD mode
An intelligence test was taken by 115 candidates. For each candidate the time taken to
'Lfx2
52= Tr- (2.9)2 complete the test was recorded, and the times were summarised in a histogram (see diagram).
Write down the frequency for each of the class intervals 0-1, 1-2,2-3,3-5 and 5-10
= W- C2.9l' minutes.
= 1.49 Calculate estimates of the mean and standard deviation of the times taken to complete the
s = "1/1.49
test. (C)
= 1.22 (2 d.p.)
The standard deviation is 1.22 (2 d.p.), as before.
Solution 1.24
Method 3 - using the calculator in SD mode.
Frequency= frequency density x interval width. Note that the interval 2-3, for example, i
This time you need to take account of the frequencies, and this is done in exactly the same represents 2 .<time < 3.
way as when finding the mean:
Time (min) 0-1 1-2 2-3 3-5 5-10
Casio 570W/85W/85WA Sharp
IMODEI[T] Frequency 10 15 25 40 25
Set SD mode IMODEIIMODEI m or IMODEI [2]
Clear memories
To calculate estimates for the mean and standard deviation, use mid-interval values, x.
Input data
Do this in the fx fx'
Time (min) X f
order x x f
0-1 0.5 10 5 2.5
1-2 1.5 15 22.5 33.75
2.5 25 62.5 156.25
2-3
To obtain 160 640
IsHIFT ImB 3-5 4 40
x=2.9 187.5 1406.25
5-10 7.5 25
s = 1.220 ... ISHIFT I [2] El
'£f = 20 IRCLj[g,Ir-,::-.{-ccc-l:-lc_tt_u-·s_o_n_tc-hc-ir-:-d '£{= 115 · '£ fx = 437.5 "£ fx' = 2238.75

'£fx =58 jRCLj ffiJ row of cakul:Jtor I Lfx 437.5


'£fx 2 = 198 IRCLIIKJ x = - = - - = 3.8 (2 s.f.)
To clear IMODEI [QJ 'Lf 115
IMODEim

)z;£~' -x'=
SDmode 2238.75
s= 3.80 ... 2 = 2.2 (2 s.f.)
Therefore the standard deviation is 1.22 (2 d.p.), as before. 115
In a grouped frequency distribution, the mid-interval value is taken as representative of the The mean time is 3.8 minutes and the standard deviation is 2.2 minutes.
interval, as in the following example.
[You could have calculated these directly using the calculator in SD mode. Check them
yourself.]
Example 1.24

30 If you are given summary information, rather than the raw data or frequency distribution, you
~.E cannot use the calculator in SD mode. You will have to use the formulae to calculate the mean
25 ,---
and standard deviation, as in the following example.
"
u

,11 20
u
'0
Example 1.25
§ 15 ,---
£ (a) Cartons of orange juice are advertised as containing llitre. A random sample of
~ 100 cartons gave the following results for the volume, x.
u 10 1---
g
~
5,.
'Lx = 101.4, 'Lx 2 = 102.83
• l
~

~
Calculate the mean and the standard deviation of the volume of orange juice in these
0 100 cartons.
2 3 4 5 6 7 8 9 10 11
0
Time {minutes)
i'·i .\ 45 ·,I

7. The following table shows the duration of


3. Do this question
(b) A machine is supposed to cut lengths of rod 50 em long. 40 telephone calls from an office via the
(a) without using SD mode,
{b) using SD mode on your calculator. switchboard.
A sample of 20 rods gave the following results for the length, x. (a) Obtain an estimate of the mean length of a
The score for a round of golf for each of 50 club telephone call and the standard deviation.
L.fx ~ 997, L.fx ~ 49 711
2 members was noted. Find the mean score for a (b) Illustrate the data graphically.
round and the standard deviation.
(i) Calculate, the mean length of the 20 rods. Duration in minutes Number of calls
Score, x Frequency, f
(ii) Calculate the variance of the lengths of the 20 rods. <1 6
66 2 1-2 10
State the units of the variance in your answer.
67 5 2-3 15
68 10 3-5 5
Solution 1.25 69 12 5-10 4
9
(a) LX~ 101.4, LX 2 ~ 102.83, 70 > 10 0
n ~ 100
71 6
LX 101.4 (O&C)
.. x ~- ~ -- ~ 1.014 72 4
n 100 73 2 8. For a set of ten numbers I:x = 290 and
The mean volume is 1.014 litres. L.x 2 = 8469. Find the mean and the variance.
4. The scores in an IQ test for 60 candidates are
s~ )L;2 -x2~ 102.83
100
1.014 2 ~ 0.0101 shown in the table. Find the mean score and the
standard deviation.
1
9. For a set of nine numbers L(x- X) = 234. Find
the standard deviation of the numbers.

The standard deviation of the volume is 0.010 litres (2 s.f.) Score Frequency 1
10. For a set of nine numbers I:(x- X) = 60 and
I:x 1 = 285. Find the mean of the numbers.
(b) L(x~997,Lfx 2 ~49711,L(~20 100-106 8
107-113 13
J 1. A group of 20 people played a game. The table
(i) x ~ L.fx ~ 997 ~ 49.85 114-120 24 below shows the frequency distribution of their
L( 20 121-127 11 scores.
128-134 4
The mean length of the rods is 49.85 em. Score 2 4 X

L(x 2 2
49 711
-49.85 2 ~ 0.5275 Number of people 2 5 7 6
(ii) Variance=--- x 5. The stemplot shows the times, recorded to the
L( 20 nearest second, of 12 people in a race.
Given that the mean score is 5, find
The variance is 0.5275 cm 2 • Calculate the mean time and the standard (a) the value of x,
deviation. (b) the variance of the distribution.
(C Additional)
Stem ~e~f J Key 115 means 15 seconds
1
1 5 5 6 6 6 12. From the information given about each of the
1 7 9 9 following sets of data, work out the missing
2 0 1 values in the table:
lf ean a 6. A vertical line graph for a set of data is shown n ~X ~x' x s
2. The table shows the weekly wages in£ of each of below. ·calculate the mean and standard
1. Do not use the statistical program on your
100 factory workers. deviation of the data. (a) 63 7623 924 800
calculator for this question.
(i) For each of the following sets of numbers, (a) Draw a histogram to illustrate this (b) 152.6 10.9 1.7
calculate the mean and the standard information. 57 300 33
(b) Calculate the mean wage and the standard (c) 52
deviation. Try using both forms of the 57 4
deviation. (d) 18
formula for the standard deviation in parts
(a) to (c). In parts (d) to (f) choose one of
the methods. Number of
13. At a bird observatory, migrating willow warblers
Wage£ workers are caught, measured and ringed before being
(a) 2, 4, 5, 6, 8
(b) 6, 8, 9, 11 10 released. The histogram below illustrates the
200<x<250 lengths, in millimetres, of the willow warblers
(c) 11, 14, 17, 23,29
(d) 5, 13, 7, 9, 16, 15 250 <x < 300 16 caught during one migration season.
(e) 4.6, 2.7, 3.1, 0.5, 6.2 300<x<375 40
(f) 200, 203, 206, 207, 209 375<x<400 26
5 6 7 8 9
(ii) Now check your answers using your 400 <x < 500 8
calculator in SD (STAT) mode.
Solution 1.26
(b) State briefly how it may be deduced from
16 the histogram (without any calculation) that 2+3+6+9
·~
ni>
E c
an estimate of the mean length is 111 mm. (a) x ----:--~ 5
4
s~ Explain briefly why this value may not be
~0
'l;;E
c E
12
the true mean length of the willow warblers
caught. S
2 Lx2
~---X
-2 4+9+36+81
4
5 2 ~ 7.5, s~ ru ~ 2.7 (2 s.f.)
~tv 8 n
~0 (c) Given that the lengths, x mm, of the willow
g.gj
~-=
warblers caught during this migration
on 4 season were such that :Ex= 13 099 and (b) Newmean~5+1~6
£0 2+3+6+9+a+b
:Ex 2 = 1 455 506, calculate the standard
deviation of the lengths. (C) 6
0 100 105 110 115 120 125 6
Length (mml
14. For a particular set of observations "'Ef = 20, 20 +a+ b
"f.(x 2 = 16 143, "f.{x = 563. Find the values of the 6
6
mean and the standard deviation. 20 +a+ b ~ 36
(a) Explain how the histogram shows that the a+b~16 ...... ®
15. For a given frequency distribution
total number of willow warblers caught at Lf(x- x) 2 ~ 182.3, '£fx 2 ~ 1025, Lf ~ 30. Variance of original set~ s 2 ~ 7.5. So new variance~ 7.5 + 2.5 ~ 10
the observatory during the migration season find the mean of the distribution. 2 2
is 1'1 8. 4+9+36+81+a +b 2
10 -6
6
16. The speeds of cars passing a speed camera are shown in the histogram.
130 +a 2 + b 2
Calculate estimates of the mean speed and the standard deviation. 10 --c------36
6
130 +a 2 + b 2
46
6
130 + a 2 + b 2 ~ 276
a 2 + b 2 ~ 146 ...... ®
From (i) b ~ 16- a. Substituting in@
a 2 +(16-a) 2 ~146
a2 + 256- 32a + a 2 ~ 146
6
2a 2 - 32a + 110 ~ 0
a 2 -16a+55~0
(a- 1l)(a- 5) ~ 0
.. a~11,a~5
4 If a~ 11, b ~ 16-11 ~ 5
Ha~5,b~16-5~11

So the two numbers are 5 and 11.


2

COMBINING SETS OF DATA


30 35 45 50 55
25
Speed (m.p.h.) Example 1.27
The number of errors, x, on each of 200 pages of typescript was monitored. The results when
summarised showed that
Calculations involving the mean and standard deviation LX~ 920 l:x 2 ~ 5032
(a) Calculate the mean and the standard deviation of the number of errors per page. A further
Example 1.26 50 pages were monitored and it was found that the mean was 4.4 errors and the standard
(a) Calculate the mean and the standard deviation of the four numbers 2, 3, 6, 9. deviation was 2.2 errors.
(b) Find the mean and the standard deviation of the number of errors per page for the 250
(b) Two numbers, a and b, are to be added to this set of four numbers, such that the mean is pages. (L)
increased by 1 and the variance is increased by 2.5. Find a and b. (L Additional)
48

(a) State how it may be deduced from the data that the mass of each fish caught by Sam was
Solution 1.27
1.00 kg.
LX 920 (b) The winner was the person who had caught the greatest total mass of fish by 4 p.m.
(a) x~-~-~4.6
n 200 Determine who was the winner, showing your working.
2 (c) Before leaving the waterside, Sam catches one more fish and weighs it. He then announces
2 LX -2 5032 2
S ~---X ~ ---4.6 ~ 4 that, if this extra fish is included with the other two fish he caught, the standard deviation
n 200
is 1.00 kg. Find the mass of this extra fish. (C)
s~\{4~2
The mean is 4.6 errors per page and the standard deviation is 2 errors.
(b) For the errors, y, on the further 50 pages Solution 1.28
Mean~4.4
(a) If the standard deviation is 0, there is no deviation from the mean. All the readings must
.. 4.4~ LY be exactly the same as the mean.
50
LY ~50 x 4.4 ~ 220 Since the mean is 1.00 kg, both fish must have weighed 1.00 kg.
The standard deviation~ 2.2
2
(b) Number of fish Mean Total mass
. z_LY 2
.. 2.2 -So-4.4 1.07 kg 12 X 1.07 ~ 12.84 kg
Ali 12
L y 2 ~ 50(2.2 + 4.4 2 2
) ~ 1210 Les 16 0.76 kg 16 X 0.76 ~ 12.16 kg
For the combined set of 250 pages: Sam 2 1.00 kg 2 X 1.00 ~ 2.00 kg

Total number of errors ~ LX + LY ~ 920 + 220 ~ 1140


The winner was Ali.
Mean= _12~'W-= 4.56
Lx 2 +Ly 2 (c) Sam: let mass of extra fish be x, so masses of his three fish are 1, 1, x.
2
(Standard deviation)' 4.56
250 2+x
5032 + 1210- 4.56' X=--
3
250
~4.1744 s~

Standard deviation~ '1/4.1744 ~ 2.04 (3 s.f.)

t.oo~ p~x' -(2;xr


In for ;_1 combined :,ct of nmnhers

1HC.1l1
n 1 1-H;
vanance --
n,-'n,
1 ~ -2+x
3
-- - -
3
2
(2+x)' (squaring both sides)

Remember that standard deviation= ~variance


1 ~2~x
2
-(4+4;+x')
Example 1.28
Three statistics students, Ali, Les and Sam, spent the day fishing. They caught three different 9 ~ 3(2 + x 2 ) - (4 + 4x + x 2 ) (multiplying by 9)
types of fish and recorded the type and mass (correct to the nearest 0.01 kg) of each fish 9 ~ 6 + 3x 2 - 4- 4x- x 2
caught. At 4 p.m., they summarised the results as follows. 0 ~ 2x 2 - 4x -7
4 ± '1116 -4(2)(-7)
X
Number of fish by type All fish caught 4

Perch Tench Roach Mean mass (kg) Standard deviation (kg) 4 ± 172.
4
Ali 2 3 7 1.07 0.42
x~3.121 ... (ignoring negative value for x)
Les 6 2 8 0.76 0.27
Sam 1 0 1 1.00 0 Mass of Sam's extra fish is 3.12 kg (2 d.p.)
50

13. The figures in the table below are the ages, to the Number of Mean cost S.D.
nearest year, of a random sample of 30 people (£) (£)
holidays
ean a negotiating a mortgage with a bank.
10. The manager of a car showroom monitored the ShopR 32 190.35 10.4
1. The mean of ten numbers is 8. If an eleventh 29 26 31 42 38 202.25 15.5
number is now included in the results, the mean numbers of cars sold during two successive
38 38 ShopS 24
five-day periods. During the first five days the 45 35 37
becomes 9. What is the value of the eleventh 36 39 49 40 32 j,
numbers of cars sold per day had mean 1.8 and (L)
number? 32 34 27 61 29
variance 0.56. During the next five days the
33 31 33 52 44
numbers of cars sold per day had mean 2.8 and 15. Three random samples of 50, 30 and 20 bags
2. The mean of four numbers is 5, and the mean of 32 30 38 42 33
variance 1.76. Find the mean and variance of the respectively are taken from the production line of
three different numbers is 12. What is the mean numbers of cars sold per day during the full ten
Copy and complete the following stem and leaf '12 kg bags' of cat litter. The contents of each
of the seven numbers together? days. (NEAB) bag are then weighed. A summary of the results
diagram. Use the diagram to identify two
features of the shape of the distribution. is shown in the table.
3. The mean of n numbers is 5. If the number 13 is
11. Prior to the start of delicate wage negotiations in
now included with then numbers, the new mean
is 6. Find the value of n.
a large company, the unions and the
management take independent samples of the
25
30
I 41 1
Size
Mean wt.
(kg)
S.D.
(kg)
Sample
work force and ask them at what percentage 35
4. The mean of the numbers 3, 6, 7, a, 14, is 8. level they believe a settlement should be made. 11.8 0.5
Find the mean age of the 30 people. Given that 1 50
Find the standard deviation of the set of The results are as follows: 18 of them are men and that the mean age of the 2 30 12.1 0.9
numbers. men is 37.72, find the mean age of the 12
3 20 11.7 1.1
Standard women. (ME/)
5. The numbers a, b, 8, 5, 7 have mean 6 and
variance 2. Find the values of a and b, if a> b. Sample Size Mean deviation Find, in kilograms to two decimal places, the
14. A travel agency has two shops, RandS. The
mean weight per bag and the standard deviation
350 12.4% 2.1% number of holidays purchased in a particular
6. For a set of 20 numbers lli = 300 and 'management' for the 100 bags. (L)
week and the mean and standard deviation of the
Z.x 2 = 5500. For a second set of 30 numbers 'union' 237 10.7% 1.8%
costs of these holidays at each shop are shown in
Lx = 480 and :Ex 2 = 9600. Find the mean and the 16. The average height of 20 boys is 160 em, with a
the following table. standard deviation of 4 em. The average height
standard deviation of the combined set of Assuming that no individual was consulted by Calculate the mean, and, to the nearest penny,
50 numbers. of 30 girls is 155 em, with a standard deviation
both sides, calculate the mean and standard the standard deviation of the costs of all the
of 3.5 em. Find the standard deviation of the
deviation for these 587 workers. (AEB) 56 holidays purchased.
7. If the mean of the following frequency whole group of 50 children.
distribution is 3.66, find the value of a.
12. In a germination experiment, 200 rows of seeds,
5 6 with ten seeds per row, were incubated. The
1 2 3 4
frequency distribution of the number of seeds
3 9 a 11 8 7 which germinated per row is shown below. SCALING SETS OF DATA
Number of seeds germinated Frequency
8. A bag contained five balls each bearing one of
the numbers 1, 2, 3, 4, 5. A ball was drawn from .0 4 Example 1.29
the bag, its number noted, and then replaced.
1 10
This was done 50 times in all and the table Sweets are packed into bags with a nominal mass of 75 g. Ten bags are picked at random
below shows the resulting frequency distribution. 2 16
from the production line and weighed. Their masses, in grams, are
3 28
Number 1 2 3 4 5 34
4 76, 74.2, 75.1, 73.7, 72, 74.3, 75.4, 74, 73.1, 72.8
11 y 8 9 5 44
Frequency X
6 32 (a) Use your calculator to find the mean mass and the standard deviation.
If the mean is 2.7, determine the values 7 16
of x andy. 10 It was later discovered that the scales were reading 3.2 g below the correct weight.
8
9 6
9. Parplan Opinion Polls Ltd conducted a (b) What was the correct mean mass of the ten bags and the correct standard deviation?
nationwide survey into the attitudes of teenage 10 0
girls. One of the questions asked was 'What is (c) Compare your answers to (a) and (b) and comment.
the ideal age for a girl to have her first baby?' In (a) Calculate the mean and the standard
reply, the sample of 165 girls from the Northern deviation of the number of seeds
zone gave a mean of 23.4 years and a standard
germinating per row.
deviation of 1.6 years. Subsequently, the overall Solution 1.29
sample of 384 girls (Northern plus Southern For another 50 rows an analysis shows that t~e
zones) gave a mean of 24.8 years and a standard mean is 4.4 seeds and the standard deviation IS
(a) According to the scales with measurements being given in grams
deviation of 2.2 years. 2.2 seeds.
(b) Determine the mean and, to two decimal
Assuming that no girl was consulted twice, X= 74.06, s = 1.166 ... = 1.17 (2 d.p.)
places, the standard deviation for the
calculate the mean and standard deviation for
the 219 girls from the Southern zone. (AEB) 250 rows.
53

You can see from the diagram that the new set of data is much more spread out.
(b) The correct readings are:
79.2, 77.4, 78.3, 76.9, 75.2, 77.5, 78.6, 77.2, 76.3, 76 Original mean

x ~ 77.26, s ~ 1.166 ... ~ 1.17 (2 d.p.) Original data


X X
t X X

(c) Notice that 77.26- 74.06 ~ 3.2 i.e. correct mean- original mean~ 3.2
So correct mean= original mean+ 3.2: correct s.d. ==original s.d. 6 8 10 12 14 16 18
2 4

X
If each reading is increased by 3.2, then the mean is increased by 3.2. The standard X X X

deviation, however, remains unaltered. New data


t
New mean

!::_

Showing the two sets of readings on a graph helps to show that although the mean increased,
the spread of the data about the mean remained the same.
va!nt of/::_.,

Original mean

Original data
X X xxxx
t X X X
X
For example, if y = -!x, since HI~~·

76 77 78 79
73 74 75
72

X X X X X XX X X X f; '-'l''htxc a dlld h :.ll't' conc;L_Jm:·;


if
New data t
New mean
lr
:1.nd

Example 1.30
Joe's mean mark for the physics tests for the term was 72. His teacher decided to scale all the
marks according to the formula y ~ 2x - 6, where y is the new mark and x the original mark.

Find Joe's new mean mark.


J X
ClHd 5\, S,_
Solution 1.30
Now consider what happens when each number in a set of readings is multiplied by a
y~2x-6
constant. .. y~2x-6
~2x72-6
For the four numbers 2, 3.5, 5, 6 >> 4.125, sx ~ 1.515 ...
~ 84
Multiplying each number by 3 to obtain y, where y ~ 3x Joe's new mean mark is 84.
gives the numbers 6, 10.5, 15, 18.

For these, y ~ 12.375, s, ~ 4.546 ... Example 1.31

Now 12.375 + 4.125 ~ 3, soy~ 3x


The standard deviation of three numbers a, b, cis 3.2.
and 4.546 ... + 1.515 ... ~ 3, SOSy=3Sx
(a) State the standard deviation of the three numbers 3a 3b 3c
(b) s ' ' .
( ·) tate the standard deviation of the three numbers a+ 2, b + 2, c + 2.
c

c State the standard deviation of the three numbers 2a + 5, 2b + 5, 2c + 5. (C)


Project X=37,sx=6
T1
!··.I.•
ir.,
Let y=cx+d
Solution 1.31
tben y=cx+d li
50=37c+d ...... ®
~I
(a) If y = 3x, then sy = 3sx = 3 x 3.2 = 9.6
(b) Ify=x+2,thensy=sx=3.2 Now Sy = CSx

(c) If y = 2x + 5, tben sy = 2sx = 2 x 3.2 = 6.4 20 = C X 6


c= 31 il
Substituting in@ 50=37x31+d
d = -731 il
:I
Comparing data by scaling The transformation for the project is y = 31x- 73!, I
When x = 46, y = 31 x 46- 731 = 80 l[
If you wish to compare two sets of data, for example examination marks in two papers, you
Anna's standardised mark for the project is 80.
can scale one of the sets of data so that the two means are the same and the two standard
(b) Relatively, Anna performed better on the project than in the examination. '
,·,,·'I

deviations are the same. '

'li
se 1h Scali sets of d
Example 1.32
1. (a) Find the mean and the standard deviation of {a) the values of a and b,
(b) the value of the scaled mark which
For students on an Electronics course the assessment consists of two components: a written
examination paper and a project. The marks for the examination paper are distributed with a
mean of 62 and a standard deviation of 16. Those for the project have a mean of 37 and a
the set of numbers 4, 6, 9, 3, 5, 6, 9.
(b) Deduce the mean and the standard deviation
of the set of numbers 514,516, 519,513,
corresponds to a mark of 64 in the original
data,
(c) the value in the original data if the scaled
il
515, 516, 519.
standard deviation of 6. Anna, a student on the course, scored 80 marks on the examination (c) Deduce the mean and the standard deviation mark is 79.
of the set of numbers 52, 78, 117, 39, 65,
paper and 46 marks for her project. 78, 117. 6. The marks of five students in a mathematics test
were 27, 31, 35, 47, 50.
Transfonn each of Anna's marks into a standardised score, such that, for each 2. A set of numbers has a mean of 22 and a (a) Calculate the mean mark and the standard
(a)
component, the mean and standard deviation for all students on the course are 50 and 20, standard deviation of 6. If 3 is added to each deviation.
number of the set, and each resulting number is {b) The marks are scaled so that the mean and
respectively. then doubled, find the mean and standard standard deviation become 50 and 20
Hence cmnpare Anna's relative perfonnance in the two assessment components. (NEAB)
(b) deviation of the new set. (C Additional} respectively. Calculate, to the nearest whole
number, the new marks corresponding to
3. A set of values of a variable X has a mean 11 and the original marks of 31 and 50.
a standard deviation a. State the new value of the IC Additional)
Solution 1.32 mean and of the standard deviation when each of
the variables is (a) increased by k, (b) multiplied
7. It is proposed to convert a set of values of a
by.p. Values of a new variable Yare obtained by
(a) Standardised values: y = 50, sy = 20 usmg the formula Y = 3X + 5. Find the mean and
variable X, whose mean 'and standard deviation
are 20 and 5 respectively, to a set of values of a
Examination X= 62, sx = 16 the standard deviation of the set of values of Y.
variable Y whose mean and standard deviation
Let y=ax+b (C Additional)
are 42 and 8 respectively. If the conversion
then y =ax+ b 4. Show that the standard deviation of the integers
formula is Y =aX+ b, calculate the values of a
50= 62a + b ...... ® 1, 2, 3, 4, 5, 6, 7 is 2.
and of b. (C Additional)

Using this result find the standard deviation of 8. In order to compare the performances of
Now = asx
Sy the numbers candidates in two schools a test was given. The
20=ax 16 (a) 101, 102, 103, 104, 105, 106, 107.
mean mark at school A was 45, and the mean
a= 1.25 (b) 100,200, 300 400 500 600 700
mark at school B was 31 with a standard
l~i 2.o1, 3.02, 4.<13, 5.04, 6:o5, 7.o6, 8.o7. deviation of 5. The marks of school A are scaled
{ } Wnte down seven integers which have so that the mean and standard deviation are the
Substituting in (j)
50= 62 X 1.25 + b mean 5 and standard deviation 6. same as school B and a mark of 85 at school A
b = -27.5 (L Additional)
becomes 63. Find the values of a and b if the
5. It is prop transformation used is y =ax+ b. Find also the
. ose d to convert a set of marks whose
The transformation for the examination paper is y = 1.25x- 27.5 mean original standard deviation of the marks from
m·t k IS 52
. h,and st an d ar d d evtatton
. . ts . 4 to a set of
school A.
When x = 80, y = 1.25 x 80-27.5 = 72.5 , r ,s Wit . mean 61 an d stand ar d deviation 3.
Th,
c equation
conve t h for th. e t rans £ormatwn
. necessary to
Anna's standardised mark for the examination is 72.5. r t e marks IS y =ax+ b. Find
T""'''
I

Example 1.34
(c) Using the frequency table estimate the mean
and standard deviation of the marks. Use the coding y ~ 200
x-25 000 to find the mean and standard d evtatwn
000 . . of the foll owmg:
.
9. The fo\lowing is a set of 109 examination marks (d) The marks are to be scaled linearly by the
relation Y =a+ bX where X is the old mark
ordered for convenience. and Y the new mark. The new mean and
20 150 000 175 000 200 000 225 000 250 000 275 000
16 17 18 X 125 000
11 11 12 13 14 2.1 25 26 26 standard deviation are to be 50 and 10
6 respectively. Using your estimates in (c) 3
24 25 25 31 19 27 35 24 12
21 21 23 28 28 29 29 29 30 36 calculate suitable values for a and b. f 5
27 27 28 32 33 33 34 34 35 39
31 32 32 38 38 39 40 10. The mean of the marks scored by candidates in
37 37 38
36 37 37 39 39 39 40 40 40 43 an examination is 45. These marks are scaled Solution 1.34
39 39 39 42 42 42 linearly to give a mean of 50 and a standard
40 41 41 41 42 47 47 47 47
40
45 46 46 53 deviation of 15. Given that the scaled mark of 80 y ~ x- 200 000
43 43 44 52 52 53 62 corresponds to an original mark of 70, calculate
50 50 51 51 52 59 61 (a) the standard deviation of the original marks, 25 000
48 59
57 58 58 82
54 54 5.1 67 70 76 77 (b) the mark which is unchanged by the scaling. so 25 OOOy ~ x - 200 000
66
63 64 66 I.e. ": ~ 25 OOOy + 200 000
Given that the greatest and least scaled marks are
(a) Construct a grouped frequency distribution x ~ 25 OOOy + 200 000 and sx 25 OOOs y
using a class width of 10 and starting with 92 and 2 respectively, calculate the
corresponding original marks. (C Additional)
o-9.
(b) Draw a histogram and comment on the X 200 000
y f fy fy'
shape of the distribution. X
y~ r.fy-
25 000
Zf- - 23 ~ -0.184
45
125
125 000 3 5 -15
USING A METHOD OF CODING TO FIND THE MEAN AND STANDARD 150 000 -2 19 -38 76 2 Zfy'
27 -27 27 s, ~v-'5/
175 000 -1
0 35 0 0 247
DEVIATION 200 000
24 24 ~ 125- (-0.184)
2

225 000 1 24
250 000 2 12 24 48 ~1.942 ...
Example
Salt is1.33 9 27 s, ~ 1.393 ".
~
packed in bags which the manufacturer claims contain 25 kg each. Eighty bags are 275 000 3 3
examined and the mass, x kg, of each is found. The results are Z(x- 25) 27 .2, r, fy'
~
J:.f 125 J:.fy 23 247
Z(x- 25\" 85.1. Find the mean and the standard deviation of the masses.

X 1295 000 X (-0.184) + 200 000


~ 5 400
Solution
You 1.33
do not knoW the actual masses and a coding has been used to summarise the results. The sx~25 OOOs y
coding is y ~ x- 25, where Ly ~ 27.2 and z/ ~ 85.1 Ly2
~ 25 000 X 1.393
~ 34 840.207 ".
s ~--ji'
~ 34 800 (3
2
' n s.£.')
Therefore ~~-0.34 2
~.~-···~· .~······-·~-
- The mean is 195 400 an d standard d · · 3
80 --,... evtatwn 34 800 ( s.f.).
~~·,· -~~-~--~~~~~~~~----~
~~

80 ~ 0.948 15 In gencr·:tl 'j" I .. ..


~0.34 s, ~ 0.9737 ... ~
Yi,y7) .... I n~)
,,Itwsctf.
. means
·-
l )Crs xJ_, x.,,
of the
o_ nunl "' ._,XII
- O
, trans!form,.,1 to the set . f
- 1"
HUIU)CCS
l

X~ y + 25 a
Now if y ~ x- 25, then x ~ y+ 25 /;
So x ~ 0.34 + 25
Therefore ~ 25.34 thc11 X=tl~

Sx = s)' so
Also Sx ~ 0.9737 ...

so and
The mean mass is 25.34 kg and the standard deviation is 0.97 kg (2 d.p.).
NOTE: The value 25 used here is sometimes kuown as the assumed mean.
~'i

(a) when the data are discrete and ungrouped- by drawing a step diagram,
(b) when the data are continuous or in the form of a grouped discrete distribution - by
drawing a cumulative frequency polygon or curve.

1\ d deviation of the
Time {min)
Frequenc_!_
nd the stan d ar 0 (a) Cumulative frequency - step diagrams for discrete ungrouped data
1. Find the mean af d using the coding -15
following sets o ata, 3
-20 The table shows the number of attempts needed to pass the driving test by 100 candidates at a
indicated: x- 312 2
-25
X f y~~ 6 particular test centre.
(a) -30
10 2 3 4 5 6
304 1 -35 Number of attempts 1
7
308 5 -40 4 2
2 Frequency 33 42 13 6
312 9 -45
1 (Number of candidates)
316 4 -50
320 4 . taken to feed the The cumulative frequency distribution is formed as follows:
t the mean ume .
324 2 CaIcuIa e hod of codmg.
animals, using a met <:1 <:2 <3 <;4 <:5 <;6
x-450 Number of attempts
y~~ h times taken on
Interval f 5 The table shows t e f oach to complete Cumulative- frequency 33 75 88 94 98 100
(b) · 30 consecuuv · e days or ac · h ve
. ular route. T1mes a
3
100 <:x < 200 one journey on a part;~st minute. Find the_ m~an T T T
200 <;X< 300
7 been given t~ the nea d the standard devmtton, 33 +42 33 + 42 + 13 total number
12 time for the JOurney ad-
300<:x<400 using a method of co mg. of candidates
18
400 <:x < 500
12 Frequency Plot the cumulative frequency against the number of attempts and decide how to join the
500 <;x < 600 Time (min)
6 1 points.
600 <;x <700
60-63
X- 0.0225 3
y- 64-67
f 0.005 12
(c) Interval 68-71
10
5 72-75
0- 4
10 76-79
o.oo5-
0.01- 13
tudents timed how long it
Q.Q15- 18 6. In a practical d~ss ~f their saliva to break dow~
0.02- 12 took for a samp e. The times to the neares
6 a 2% starch solutt<;>ll·h ble b~low. Find the
0.025- hown lD t eta
6 second _are s . a method of coding.
0.03- mean tlme, usmg
0
o.Q35-
Time (seconds)
Frequency

1
ii
· 1 t of data 11-20
2. For a partlCU ar se I 50)2 = 238.4 2
~'x- 50)~ 123.5, :E,x- 21-30
n = 100, .:..., d d deviation of x. 5
e stan ar 31-40
Find the mean and th 11
41-50
8
· nee of x it 51-60
3 Find t he vana 2 2593 2
· 7 Lf'x 100) ~ '
:E{(x-100)~12, 1
- 61-70
1
:E f ~ 20 71-90
onth the owner ot a i:
4 Each morning tor am I it took to feed the
. h ld. timed how ong
small o mg
animals. The resu1ts
were as shown: _:\ i i h ~
l_ r:
PI ottmg
· t he values (1, 33), (2, 75), (3, 88), ... tells you that 33 people took 1 attempt,
7 5 people took<: 2 attempts, 88 people took<: 3 attempts.
~f you )Om the points with straight lines, such as from A to B, and consider a cumulative
""Jl\fE fREQUENCY
CUM Ul,... . A cumulative
y up to a particular ttem. d be illustrated
rcquency of 50, this would suggest that 50 people took<: 1.4 attempts which is nonsense.

The cumulative frequency i~~h:b:~~~~:~~~n: frequency distribution an can


frequency distnbutton can
(b) Cumulative frequency polygons and curves for grouped data
Consider this situation:
Clearly it is not sensible to join the points directly.
When data are discrete (and usually integer) values and also are uugronped, the points can be
Six weeks after plantin
frequency distribution r~:m:~~;h~
h . s own.
of 30 broad bean plants wer
e measured and th

joined by a series of steps as shown: Height, x em 3 9<;x<12 12~x< 15 15<;x<18 1S<;x<21


e
Cumulative frequency graph of number of attempts 2 11 5 1
Frequency l
· frequency is calculated u t
The cum uIatlVe 10
The upper class bound . p o each upper class boundary
anes are 6, 9, 12, 15 18 .
The lower bo und ary of the first class is 3 • Th'
' 15 '21.
. mserted
1s . f
Height, x em or completeness.
<3 <6 <9 < 12 < 15 < 18 <21

0 1 3 14 24 29 30
Cumulative frequency

roul number
P) t'ln•b

The step diagram is necessary because the values are not distributed evenly throughout the
intervals 1 to 2, 2 to 3, ... but 'jump' or 'step up' from 1 to 2, then 2 to 3 and so on.

(a) Consider a cumulative frequency of 50. From the graph, the number of attempts is two. 30
This means that when the data are placed in ascending order of size, the 50th item is 2,
i 27
I
l>~
i.e. the 50th person took two attempts.
~
lb I To f;od "'"" >,,ok "0 '" loo> ,,,.,.,, '" "' ""''" ll1 loom < "" >he bo>iwrt rul l 20
u
axis, to the top of the step, then go left to the cumulative frequency axis.
This shows that 94 candidates took up to four attempts.
lf you go to the bottom of the step, this tells you the number of candidates who wok
fewer than four attempts (88 in this case).
N otic> >h » l> ""I 1 '"'k" """ whw >"" "' d f,oro >b> dl '""' do" "" >he bwlw•""' ood>
It would be silly to consider 3.6 attempts, for example.
Note that in a step diagram, the mode is given by the value of the variable that gives the l t-"1Jt'f'r_;_r!fli'Mll.)JjlillJJl
o "-"'-"";
3 9 10.512 1516.518 21
Height (em)
'steepest' step.
From the graph above, the mode is two.
Solution 1.35
Values can be estimated from the graph. Note that the graph can be read in either direction. Frequency-- frequency density x width 'so f or 3 (age<5 f-20
Calculating the oth f . ' - 0 x 2 = 400

tl
er requenc1es gives
(i) To find the number of plants that were less than 10.5 em tall:
" Find the height 10.5 em on the horizontal axis
• Draw a vertical line up from 10.5 to meet the curve Age 3o;;;;x<5 7<:x<11 11 <;;X< 16 16 <;;X< 18
" Draw a horizontal line to the cumulative frequency axis and read 10.5 Frequency 400 800 1800 2000 600
the value (Number of pupils)
From the graph, seven plants were less than 10.5 em tall
(a) The cumulative frequency table is
(ii) To find x where 90% of the plants were less than x em tall:
••

.,
90% of 30
Find 27

the
Draw
on =the

curve
27vertical axis and draw a horizontal line to meet

a vertical line to the horizontal or 'height' axis and read

the value
t:L 16.5
Age in years up to

Cumulative frequency
3

0 400
5 7
1200
11
3000 5000
16 18
5600

From the graph, 27 plants were less than 16.5 ern tall, sox= 16.5

Example 1.35
A survey is carried out to determine the numbers of pupils in various age groups who are
attending nurseries, schools and colleges within a certain area. The results are summarised in

the following histogram.

ii

" ': . ! '

o+WbllliillL--4--~-----rJ-------~L-_J~~
15
20
Age in years
! i
10
5
0

(a) Copy and complete the following table showing the ages of the pupils and the
· L: ll I:
correspOnding cumulative frequencies.
16 18 If 30'Xo of the pupils exceed a
(c) N
7 11
3 5 , ow 70%
From the of h5600-- 3920 certam age, then 70% of pupl'1s are younger than thi
Age in years up to 5600
grap ' 3920 pupils ha s age.
0 ve an age up to 13 3 .
Cumulative frequency so 30% of the pupils are old h . years, I.e. approx 13 years 4 months
er t an 13 years 4 months.
(b) Draw a cumulative frequency diagram for the distribution.
(c) Use your cumulative frequency diagram to estimate the age exceeded by 30% of the pupils
in the survey. (NEAB)
64 (\ (c) Form the frequency distribution and calculate the frequency density for each interval,
where frequency density ~ frequency + interval width.
Note that the width of each interval is 5.
articular morning. A
Example 1.36 \ h m to travel to college on a p Upper Cumulative Frequency
l dhow long rt too c t e frequency Tline (min) Frequency density
dl~~-~b~u~ti~o~n~w':as~fo:r~m:e::d::.'---;,::;;;;;fu;;fr;:,;~
boundary
Students were
umulative as ce
frequency
C
lStn
cumulative frequenc~ 5 28 0- 28 5.6
Tfilin~e~t~ak~e=n~(~m~in~u~t:es~)_ _~::~-;;---- 10 45 5- 45-28 ~ 17 3.4
::.: 28
15 81 10- 81-45 ~ 36 7.2
<5 45
<10 20 143 15- 143-81 ~ 62 12.4
81
<15 25 280 20- 280-143 ~ 137 27.4
143
<20 30 349 25- 349-280 ~ 69 13.8
280
<25 35 374 30- 374-349~ 25 5
349
<30 40 395 35- 395-374 ~ 21 4.2
374
<35 45 400 40-(45) 400-395 ~ 5 1
395
<40 400 Total ~400
<45

. f quency polygon. · d
(a) Draw a cumulattve re dents took less than 18 mmutes. t frequency distribution an
(b) Estimate how many stu als of 0-, 5-, 10-, ... ,construe a
(c) Taking equal class mterv
draw a histogram.
10
t ken to travel to college I
Solution 1.36 o show the umes a I
tive frequency polygon t ' -: I I I
(a) Cumula ':\ 0
0 5 10 15 20 25 30 35 40 45
\i, ·:t\ Time (minutes)

r\ \.

CUMULATIVE PERCENTAGE FREQUENCY DIAGRAMS


These are particularly useful when two or more distributionS are to be compared. For
example, suppose we have the examination marks of 200 boys and 300 girls in Year 8.

Cumulative frequency Cumulative frequency


Mark (boys) (girls)
,-. 0
<10 6
<20 22 6
<30 60 12
<40 140 24
<50 172 42
<60 188 75
<70 196 120
<80 198 246
<90 200 294

~~s
<100 200 300

took than 18 minutes.


h 114 students
h
(b) Estimating from t e grap '
Boys
Cumulative % Frequency
Obtain the cumulative percentage frequencies as follows: Mark frequency Mark %Frequency density
Girls (tota\300)
Boys (tota\200) Cumulative % <10 3% 0 3% 0.3
Cumulative 0.8
Cumulative % frequency <20 11% 10- 8%
Cumulative frequency 19% 1.9
frequency <30 30% 20-
frequency 3Zo = O% 4.0
Mark 0 <40 70% 30- 40%
2~0 = 3% 3~0 = 2% 1.6
6 6 <50 86% 40- 16%
<10 f~o=11% l~o = 4% 0.8
22 12 <60 94% 50- 8%
<20 fgo= 30% f<io = 8% 0.4
60 24 <70 98% 60- 4%
<30 lri~=70% i~o = 14% 0.1
140 42 <80 99% 70- 1%
<40 11~ =' 86% 75
ito= 25%
80- 1% 0.1
172 1~Z=4o% <90 100%
<50 l~~ = 94% 120 90 0% 0
188 i6~ = 82% <100 100%
<60 l~t = 98% 246
196 3~6 = 98% TotallOO%
<70 i~~ = 99% 294
198 ~gg = 100%
<80 ig~ = 100% 300
200
<90
200
1gz = 100% Girls
<100 Frequency
Cumulative percentage frequency curves Cumulative %
Mark frequency Mark %Frequency density

<10 0% 0 0% 0
<20 2% 10- 2% 0.2
<30 4% 20- 2% 0.2
90
" <40 8% 30- 4% 0.4

'
3 80

70
Girls
<50
<60
<70
<80
<90
14%
25%
40%
82%
98%
40-
50-
60-
70-
80-
6%
11%
15%
42%
16%
0.6
1.1
1.5
4.2
1.6
60 <100 100% 90 2% 02
Total100%
50

x' Boy s' results Gtrls ' results


I
40
I' ~ 4
"
>i'
30 u

I
g 3
~
I
20
x
I l
I -
,-
10 fRI ,-

0
0
>'fX
20 40 60 80 100
MarK 0
0
n
10 20 30 40 50 60 70 80 90 100
h

Great care must be taken when comparing these curves. A common mistake is to say that the Mark
boys have done better than the girls because the boys' graph is above that of the girls. If you
calculate the corresponding percentage frequencies and draw the histograms you will see that

this is not the case.


(b) Consider this set of numbers· 36
There are eight numbers s ;h , 41dc 27: 32, 29, 39, 39, 43.
observ atton.
. . does, not
As thts o exist ran th
e me find rs the 1(8
12 +h1) t h observation, i.e. the 4 5th
lt is easy to see that the girls have done better than the boys. The modal class for the boys' o servattons. ' eva ue t at ts half-wa be ·
Arranging
b the numb . d y tween the 4th and 5th
marks is 30-39, whereas the modal class for the girls' marks is 70-79. The type of
distribution for the boys' marks is said to be positively skewed and that for the girls' marks is ers m or er of size gives 27, 29, 32, \36,39\ 39, 41,43

said to be negatively skewed.


The median is half-way between 36 4.5th obsmation
Note that and 39 ' so median~ !(36 + 39) ~ 37.5
MEDIAN, QUARTILES AND PERCENTILES
The median is an average that is unaffected by extreme values. It is often described as the

betwee~ :~~:,t:~~si(~h:r;)~re values~~f


t.f there
- if t h ere is
. an
ts an odd
evennum b er of
number ofobservati
b ons, t h e me d'tan is the middle val
middle value. the median is halfway two middle these are c and d,
For a set of observations arranged in order of size, the is rhc value ol the way

through the distribution.


Other quantities that are unaffected by extreme values are the quartiles and percentiles. These
are useful in giving an idea of the variability or spread of the data. Ungrouped data- quartiles
For data arranged in order of size: The quartiles should divide in h al f the two distributions either sid e o f t I1C rnedtan,
example: . for
0 the lower Q is the 25')\, of the way the
e the upper quartile 1 is the 75% of the way the distri'bution.
® the nth pcrcenti\c, P, is the n% of the way through the '"''"""""on. Q, ~ 5
Therefore the median (sometimes called Q 2) is the 50th percentile, the lower quartile is the (a) 3 3 12) 6 8 0 12 14 @) 20 24
Q, ~ 9
25th percentile and the upper quartile is the 75th percentile. The quartiles, together with the T T T
Q3 ~ 19
Q, (lower Q, (median)
quartile) Q3 (upper
quartile)
median, split the distribution into four equal parts.
Interquartile range~ ~3
0 - Q[ -- 19 -5 ~ 14
lnterquartile range
The difference between the quartiles, Q
3
- Q 1 is known as the interquartile range. It tells you
the range of the middle 50% of the distribution and so is also unaffected by extreme values.
(b) 20 4? 23 I 26 @ 28
Q, ~23
Q, ~ !(23 + 26) ~ 24.5

Intcrquartik range= upper quartile-- lower qu;artik -.o.- Q_:\ Q


1 Q, Q, b3 Q,~27

lnterquartile range~ ~3
0 - Q, -- 27 -23~4
Q, ~ 1(150 + 154) ~ 152
lnterpercentile range (c) 147 150 T
I 154 158 TI 159 162 ~164 165
Ranges between various percentiles can be found. For example, the range giving the middle Q, ~ 1(158 + 159) ~ 158.5
80% of the readings is found by subtracting the 1Oth percentile from the 90th percentile, Q, Q2 Q Q3 ~ ,(162 + 164) ~ 163
I 3
nterquartile range~ Q 3 - Q 1 ~ 163 - 152 ~ 11
i.e.
When P9o- P10. the median and percentiles, it is important to take note of whether the data are
finding

grouped or ungrouped.
(d) 10 12 113 15 @) 19 24 I 26 26 Q, ~ ~(12 + 13) ~ 12.5
Q, ~ 19
o
~1
T
Q, 0 Q 3 ~ !(24 + 26) ~ 25
~3
7 l)th
Ungrouped data - median . uartr·1 e range~ Q,- Q, ~ 25- 12.5 ~ 12.5
S lnterq
for ungrollpcd data consisting of n observations in order of size, the mc;dirm is the

~
, omettmes
. the fo ll owmg
. rule is used to find h .
observation. Q' l(n + 1) h t e quarttles:
(a) Consider this set of numbers: 7, 7, 2, 3, 4, 2, 7, 9, 31.
There are nine numbers, so the median is the i(9 + 1)th observation, i.e. the This .4 t value , 0~3 ~ 41(n + 1)th value.

5th observation. even. It does notw~t


. t ule agrees .h h
t e above
, owever, makemethod
a greatwhen n isd'ff
deal of odd.' b ut there lS
. a discrepancy when n is
Arranging them in order gives 2, 2, 3, 4, li] 7, 7, 9, 31 I erence whtch method is used.

)th obs~:nation

The median is 7.
Therefore Q, is 0 · 135 second s.
Q,
0.2 isseconds.
Th the
f 18.5th value . Th"JS Js
· halfway between the 18th and 19 t h va 1ues, whJch
. are both

Example 1.37 time experiment was performed first with 21 girls and then with 24 boys. The
A reaction ere ore Q, is 0.2 seconds.
results are shown on the stem and leaf diagram. The interquartile range ~ Q 3 - Q 1 -- 0 • 2 - 0.135-0 065
Summary of results - · seconds
Reaction times
Key (Boys)
1\8 means 0.18 sec. Girls Boys
Key (Girls)
6 \ 1 means 0.16 sec. Median 0.19 s 0
Boys Interquartile range 0 06 .17 s
Girls Th . s 000
2 4
4
3 3 2 2 2 2 2
0 0 0 1 1
ese results confirm what th
slower than the boys to react, eb~:::;!~e:f
r tsdiagram
.
sho":s, that
more vanabtlity the
in th
s
e bgirls
oys ' generally
results. are
1 0 0 2
1 8 8 8
9 9 @) 8 8 6 6 7 7
7 7 6 1
1 4 5 5
5 5 5 4 4 2 3 Ungrouped data in a frequency distribution - .
1
0 1 1 To find the median and . median and quartiles
t~:~;:~uled
1
0 9 it is useful to find t he cumulative
item. quarules frequency
of data in the form
as this of a
gives frequency
a requency up todistribution,
a particular
Find the median and the interquartile range for both sets of reaction times. Comment on your

answers. Example 1.38


The table
median shows of
number thechild
number of child
ren per famtly, . the
. renand
m interquartJ"l
thefamily for 35e range.
famTI tes m
. a certam
. area. Find the
Solution 1.37
For theare
There girls:
21 girls, so the median is the :1(21 + 1)th value, i.e. the 11th value. This value has 0 1 2 3
Number of children

1.3t:8~ __:___:_~:::_::.:_::::"'~-_3 ~--~'__~==~==~ 5 12 9


been ringed on the diagram and is obtained by counting up from the bottom, 14, 14, 15,
15, ... or counting down from the top, 24, 23, 23, 22, ... until the 11th value is reached.
Frequency (number of families) 3 4
4 2
5
Solution _ __ I

The median is 0.19 seconds. 2


Find the quartiles by dividing in half the two distributions either side of Q . The cumulative fre quency d"Istnbution
. is formed as fo II ows·
So Q is the 5.5th value. This is halfway between the 5th and 6th valnes. On this stem and <;5
Number of children 0 <;1 . <;4
leaf diagram, connt from the bottom up, so the 5th valne is 0.15 and the 6th value is 0.16.
1
Cumul attve
· f requency (families) 3 8 20 29 33 35
Therefore Q 1 is 0.155 seconds.
Q is the 16.5th value. This is halfway between the 16th and 17th values, i.e. between 0.21
2 Since there are 35 I . 1 S ; s J2
and 0.22. are ei gh t famJhes
. . with
va ues
<;1the medJan is th e 1(35
child 2. + l)th value, i.e. the 18th .
Therefore th d" and 20 famJhes with <;2 childre h value. Smce there
Therefore Q 3 is 0.215 seconds.
range~ ~ 0.215- 0.155 ~ 0.06 seconds.
. . . e me Jan number of childr f . . n, t e 18th value must be 2.
The interquartile Q3- Q1 . liKe n IS odd: en per amdy Js 2.
S
For theare
There boys:
24 boys, so the median is the 1(24 + l)th value, i.e. the 12.5th value. This is halfwaY Q, ~ ~(35 + l)th va l ue ~ 9th value ~ 2
between the 12th and 13th values which are both 0.17 seconds. 0 - "(35 +1)thvalue~27th I
.--1-4
Therefore . va ue = 3.
So the median reaction time for the boys is 0.17 seconds. , mterqua rt1·1e range~ 3 - 2- - 1 ch"ld
1 per family.
Q is the 6.5th value. This is halfway between the 6th and 7th values which are 0.13
1
and 0.14.
DP.:r.t. 73

Cumulative frequency graph of th c numb er of novels read

Illustrating this on a 'step' diagram showing the cumulative frequency: llit-~!lill rr -r~r 111! , ~ 1
I
Iii
J ;i
\\ \ \1\ [T
i !l
':
!I I !: ; !;
I
! il
\:\, !i !i
1
ij H iT

:
! !
IT :L' ~~
\

i_:
iJ
It
IT - I
I

ll
!
ii

il ! I ! I \I Jr
i i.:J :

II Ill' rl
I

~! ifi!J! !J
'i
'I: \Y ~-I

LI ,, I! I

(\
\'\'\
Solution 1.39
\U\' LT_ ,-~

(a) The 'steepest' step occurs when X-


(b) There are 30 pupils so th d' -3, so m _ ode% 3 novels
h If , e me ran rs the !(30 1) h .
(a) To find the median, i.e. the 18th value, read across from 18 on the a -way between the 15th value and the l6th+ It value, r.e. the 15.5th value. This is
vertical axis, then down to 2. Fsorom graph,
median_ 3
15th value- 3
- , 16th value% 4
va ue.
Median%2 1
- .5 nove s
(c) From the graph the number t h at read <:5 novels is 21. The 22nd person must have read
6 novels.
(b) To find the lower quartile, i.e. the 9th value, read across from 9 on the
Number who read more t h an 5 novels% 30- 21 %9
vertical axis.
Lower quartile% 2 Percentag e th at rea d lnore than 5 novels= /ox lOOo/o
%30%
(c) To find the upper quartile, i.e. the 27th value, read across from 27 on

the vertical axis.


Upper quartile% 3

Therefore interquartile range %3 - 2 % 1


Exerrise
~
ungrouped'· data rnulative fr"que
lJ e . median
ncy, . and quartiles --

Example 1.39 r. Fm d t Ile median of each


numbers: , of the following sets of 2. The
is thrown 60 ti~~she :~odreshobtair:ed when a die
table show
· m t c med1an score.
A teacher asked her class of thirty 14-year-olds how many novels they bad read during the
(a) 4 6 18 2
term. The results are illustrated in the cumulative frequency graph overleaf. (b) 192: 2~1? is~' i6, 22, s, zo, 4, s Score, x 1 2 3 4 5 6
(c) 1267 1 , ' 10,214,204
0 6
(d) 0.7 4 8 6,; 8 95,3457,2164
, . , . 5, 0.78, 0.45, 0.32, 1.9, 0.0078 Frequency, f 1l 9 8 13 9 9
(a) Write down the mode.
(b) Find the median number of novels read.
(c) What percentage of the class read more than 5 novels?
_-_-_ -_-

Grouped data - median and quartiles


table shows the
This cumulative frequf ency h of a class of 32
7. number of absences or eac When data have been grouped into intervals, the original information has been lost, so it is
These are the test marks of 11 students. children during one term.
3. only possible to make estimates of the median and quartiles. One way of doing this is to use a
52,61, 78,49,47, 79,54,58,62, 73,72 cumulative frequency graph, or cumulative percentage frequency graph as follows:
Times
Find 0 <1 <2 <3
(a) the median . absent
(b) the lower quartt_le
32 n i 100%
(c) the upper qua~ttle
(d) the interquarttle range.
Cumulative
frequency 5 11 20 23 27 28 31
i
]"
~

.1'
d. umber of absences. -i-n --->----· 75% --">"
4.
. d the median and interquartile range of the
Fm . (a) Find the me mn fn h .ddle 50% of the ~ "~
following distributwns: j (b) Find the range o t e rot 1'
(a) Sten; te~f [Key 512 means 52 observations. ber of absences. 8
in --·->- '
I
~
0
50% ->-· I l
(c) Calculate the mean num . .
(d) Calculate the standard devtatton. . I y I y
2
3
4
3 4 4
2 8 8
15667
d . the effectiveness of Famtly i-n ->" v 25% -~>-, v
5
6
2 3 3
5 7 8 8
8. A researcher, stu ymg . d out a survey of
Income Supplem:n~, ca~l\enefit. As part of the
120 families recetvmg t e d d the number of
v v
7 2 4 survey th_e researfhe~tec~~eeresults are illustrated
0
a, a3 a, "'o;· Q3

children m e ac~ afl ~ency graph below.


"'o;·
ro ro
~

1
~
8 0 ,
(b) Stem Leaf [Key 112 means 1.2 J in the cumu attve req
,.,,
"'Ff
,
3 6
;:.-- 120
,,, t<
3
2
1 2
5 7
0 3 4 4
~
5- 110
!t1\I1 Grouped data
Cumulative
frequency curve
Cumulative percentage
frequency curve
2 .!'
1 6 7 8 8 9 9
1 2 2 3 4 "
~ 100 Lower quartile, Q 1 ! nth reading 25% reading
0 5 5 'S
E Median, Q 2 ! nth reading SO% reading
0 1 3 3 8 0 9
i nth reading 75% reading
Upper quartile, Q 3
(c) Stem Leaf [Key 22 11 means 23] 80
6 0 2 2 -
1 1 2 3 Cumulative frequency curve Cumulative % frequency curve
10 70
14 0 2 2 3 3
0 2 3 3 3 3 3 :'.'~'' \\ Note that the i(n + 1)th reading is not used for the median. If you used this value you would
not arrive at the same point on the cumulative frequency axis when you worked down from
18 60 I,
22 3 3 3 3 the top of the scale as you would when you worked up from the bottom of the scale. The !nth
26 0 0 2
50 or 50% value is needed for the median.
30 1 3

5.
Find the median td
requenc
interid~:t~~~~~~~:of each 40 Note also, that if preferred, a cumulative frequency polygon or cumulative percentage
frequency polygon can be drawn. The values obtained for the median and quartiles will not
of th e following 30
(a) 8 9 10 vary greatly from those obtained from curves.
5 6 7
X
5 20
15 18 6
6 11
f Example 1.40
14 15 16
10 1.\',\
',\\ :n lj
\.',!
i\\\ ~~!
8 The table gives the cumulative distribution of the heights (in centimetres) of 400 children in a
(b) 12 13 7
X o 23456
15 7 0 Number of children
3 9 11
f Height (em) <100 <110 <120 <130 <140 <150 <160 <170
h ws the number of goals
(a) Write down the mode ~nd the median of the
395 400
number of children per ~~rrnly. of the number Cumulative 0 27 85 215 320 370
6. The frequency tables o. . 25 games played.
scored in netball by Jemtma tn (b) Find the interquartl_ e range frequency
of children per fa~tly · t' le range is only a
Number of goals ) Ex lain why the mterquar 1 . e of
certain school:
8 6 (c ro~gh measure of spread for thts typ (NEAB)
1 3 2 5
0 distribution. (a) Draw a cumulative frequency curve.
Frequency
(b) Find an estimate of the median height.
umulative frequency table.
(a) Construct a cd_ t illustrate the table. (c) Determine the interquartile range.
b) Draw a step tagram o f \
( . d the median number o goa s.
(c) Fm '\ nge
(d) Find the interquartl e ra .
Example 1.41
The masses, measured to the nearest kilogram, of 50 boys are noted and a cumulative
. th 10 to 90 percentile range. percentage frequency distribution formed.
(d) Determme e
<59.5 <64.5 <69.5 <74.5 <79.5 <84.5 <89.5
mass (kg)
Solution 1.40 how the heights of 400 children
Cumulative
(a) Cumulative frequency curve to s 88 100
\ ;-i-\ %frequency 0 4 16 40 68
v
j
~ 1 'i

Draw a cumulative percentage frequency curve and use it to estimate the median mass and the
interquartile range.

Solution 1.41
Cumulative % frequency curve to show masses of 50 boys
0
I, • ..
~
:! I
JiT 11J'!JT }!
1!/ T1'
20 0
i trH1L: trr
\i ~ :I \1
40%
l
00 H
11 ,J
40 ~ !r I!
u. 0
'i :l;
74.s~-
160 170 84.5 89.5
0 140 150 5 9.5 64.5 69.5 79.5
120 130
100 110 Height (em) Mass (llg)

The median is the 50% reading. From the graph this is 76.3 kg.
. . h 1(400)th value, i.e. the 200th value.
For the medtan, fmd t The lower quartile, Q 1 is the 25% reading, so Q 1 = 71.5 kg.
(b) t"e 2 te of the me d'tan 1S
. 129 em •
From the grap h • an es tma . Q h 300th
th upper quart!1e, 3• t e The upper quartile Q 3 is the 75% reading, so Q 3 = 80.5 kg.
. find the 1 OOth value aud for e
(c) For the lower quarttle, Q,, The interquartile range= Q 3 - Q 1 = 80.5- 71.5 = 9 kg.
value. It is interesting to note that if the data are represented by a histogram, the median divides the
- 121 5 em and Q, = 13 7 ·5 em
From the graph, Q t - · area exactly in half.
The interquartile range= Q3- Q,121 5
=137.5- . Histogram to show the masses of 50 boys
=16cm
f th middle 50% of the readings.
Note that this is the range o e h h g 15

r-rr~
. h. 10o/.0 of the way throug t e
. ·
For the 10th percentile (wntteu, to
p ) find the value wh!C ts . . 90 (400)th
h lue. The 90th percentile lS the too
'
! 10
(d) c
. the fl-(400)th value, t.e. the 40t va u•
rea d mgs, 100
value, i.e. the 360th value. 5
.------ E 1-

Pgo= 147 em, P,o= 113 em


0
e 10 to 90 percentile range = Pgo- P1o 59.5 64.5 69.5 74.5 79.5 84.5 89.5
Th =147 -113 Mass (kg)
=34 em.
Example 1.43
The ages of 160 members of a bridge club are grouped as shown in the table.

Example 1.42 30 40 50 60 70 90
Examinations in English, Mathematics and Science were taken by 400 students. Each Age
examination was marked out of 100 and the cumulative frequency graphs illustrating the 61 37 15 0
Number of members 5 42
results are shown below. ~ 400
Without drawing a cumulative frequency curve, estimate
(;' 400 E
~ ~ (a) the median age
(b) tbe number of ~e b
. (c) the 20th percentil~ ers aged 67 or over,
~ 200
E il
\zoo
u o+-~~-4---L--1-
100
50 Solution 1.43
100
o Mark
o+---~--~£-~--"­
100
50
Mark Form a cumulative frequency distribution.
o 50 Science
Mark
Mathematics <40 <50 <60 <70 <90
Age
English 160
Cumulative frequency 5 47 108 145
(a) In which subject was the median mark the highest? (C)
(b) In which subject was the interquartile range of the marks the greatest?
F
(a) Smce ob servatwns the med· h
(c) In which subject did approximately 75% of the students score 50 marks or more? rom there are 160 under. 50 'and 108 tan ts t e 80tb observation.
the table,
the mterval 50-60
47 are
. are under 60, so the 80th p
==u=¥~
h

60 years
Solution 1.42 50 years Median
3
Showing
Median Qtheisworking on the diagrams:
200th reading, lower quartile Q 1 is 100th, upper quartile Q is the
2 47 people
300th reading.
I
,, .. i 400
1os-p~~PI~-~ ---..
. E \-·--'-···--~-~---~
----------1'
... \ 1400

I ~~ zoo +·······+····""-! 108-47-61


80-47 = . 61 peop1em
- 33 'soth ere are . the interval50-60
-~
~zoo
\
il 33 ' so, assummg that the a
':of thed way along the iuterval which g:~sa:e:~~~ly
.
£distributed, the median value will be
il ! · · me 1an =50+ ll x ~ 55 o ten years.
o+-~~-+tJ-J-L-~ 61 10 - • 4 years

/ 0
/' y
Q2Q3 100
0 ° °
03
150 2 100
Mark . e 67 IS
(b) Theag · lm t h e interval 60~ 70 h.
· of
5 ' 67 IS located th h w tch has a width of 10
0
0 Marl< 10 e way t rough this interval. .
0 Science
Mathematics 60 67 70
English
(a) The median, Q 2, is the highest in English.
(b) The interquartile range, Q 3 - Q 1 , is greatest for Science.
(c) The subject in which 300 students scored 50 or more, i.e. 75% scored 50 or more is x people
--i45P~~-pj~~ - --·---------- ----r

Science.
Using linear int~rp~:/n th~ mterval60-67
The number of eo l . ,
~~~~'
is 145-108 = 37
to
Now of 73 = 25.9 j]j of the 37 people will be under 67 years old
, . number of peo 1 d .
So number of people e6;n er 67 years old = 108 + 26 = 134
Using linear interpolation or over= 160- 134 = 26
It is possible to estimate the median, quartiles or other percentiles for grouped data withont
drawing the cumulative frequency graph. The method is known as linear interpolation.
Using ratios:
X-~ 60
(c) The 20th percentile is the value 20% of the way through the distribution. 50 100 sox= 100 x50=30
There are 160 observations and 20% of 160 = 32, so the age of the 32nd person is The number of calls= 30% of 500 = 150
Five people were under 40,47 people were under 50, so the 32nd person is in the
needed. 150 calls lasted less than a mtnute.
.

interval40-50.
The number of people in this interval= 47- 5 = 42. 50
X
40
Exercise lk Cumul at.IVe f requency
~ ~ 1an and quartiles- grouped data
med' , 1

a P!"f ~alue greater


1. The table below (c) 50% of the samples had
5 people of the masses of shows the fre quency dJstnbution
. . than x. Find x Wh
Measurements haveb~~nen stu~endts at a college.
value? . at name ts given to this
32 people .----------~ 52
----------------------------
47 people nearest kilogram. n recor e to the 4 8 .:; : equal c·l ass mtervals
(d) Taking · of 4.4.;;; x < 4 8

~1 of the way through the interval40-50. distributio~ ~nd dr~onstb-ct the frequency·
· ...,x<52 etc
32- 5 = 27, sox will be Mass (kg) Frequency the median on th h'!' a tstogram. Show
e tstogram.
x=40d~x10=46.4 ... 40-44 3
45-49 2 3. Eggs laid at Hill Farm ar .
The 20th percentile is 46.4 years (1 d.p.) results grouped h e wctghed and the
7 ass own:
50-54
55-59 18 Frequency
Mass (g)
60-64 18
65-69 3 -50 3
Example 1.44 70-74 1 -54 2
The distribution of the lengths of time of a large number of telephone calls made from an
office in a given week was such that the median was 100 seconds and the 80th percentile was -58 5
d a cu ~u 1attve
· f requency table d
~aw
(a) Construct
-62 12
190 seconds. Without drawing the cumulative frequency curve, estimate (b) a cumulatwe frequency curve. an
\?owl ~any students weighed less than -66 10
. cg. -70 6
(a)
(b) the
the upper
numberquartile,
of calls, out of 500, that lasted less than a minute. (c) How
61 kg?many stude n t s wetg
· h ed more than
-74 2
(d) 2~% were heavier than x kg
Fmd the value of x ·
Construct a cumulativ f
a cumulative frequenc e cr~~ue.r::? table and draw
(e) Est~mate the media~. estimate the med1" y t tve. se the curve to
Solution 1.44 3
(f) Esttmate the interquartile range.
an mass.
(a) Show the information on a diagram, denoting the upper quartile by Q .
a, 190 seconds 4.
2. Fifty
woodland d h were co ll ected m
soil samples . an area of gso
100 seconds tj
found. Th~ ~~~mtulet~H !value for each sample was
a tve requcnc d" "b . l'
-----------------~'
was constructed as sh own m . theYtable.
tstn utton \1li1 H
H
-----------------------------~3>-\
50%
?H value Cumulative frequency
40
75%
----------~----?
---------------- 80% <4.8 1
<5.2 2 30
tl
Percentage of distribution in interval100-190 is 30%. <5.6 5
Percentage of distribution in intervallOO-Q 3 is 25%.
So Q is ~t of the way through the intervall00-190.
~l,
This 3interval has a width of 90 so Q 3 = 100 + x 90 = 175
<6.0
<6.4
<6.8
10
19
38
20
ll
<7.2 43
:i
The upper quartile is 175 seconds. <7.6 46 H"
100 seconds <8.0 49 10
60 seconds
<8.4 50
!;
(b)

.-------x%-------------~!I (\l) Draw a cumulativ f.


( )) What pcrcenta e tequency curve. 1 ·r :-.: ,.f',,
1l

--
0
pl-I value less t~:~f7~he samples had a 0 5 10 15 20 25 30 35 40
---------------------------
50%
Time (minutes)

x% of calls lasted less than 60 seconds and SO% lasted less than 100 seconds.
and travels to a second ci f~rts. from one city
13. Every day at 08:28 a train de
Use your curve to estimate
(a) the median distance the journey were recordeY>10 ~
tnnes taken for
l~l :~~ 1
;e~:!~~~~ ~[~;~~t~~~~e ~~tances, ![
(a) Draw the cumulative frequency curve. certain period and mmutes over a
(b) Use your curve to estimate the median table. were grouped as shown in the
The cumulative frequency curve has been drawn travel more than I 30 o need to
<ill. IC)
temperature.
from information about the amount of time (c) In a particular house it was found that the
spent by 50 people in a supermarket on a central heating was turned on when the Time Frequency
weekly maximum temperature fell 10. The prices, on a particular da 0 f
the London Stock E h Y' 53 stocks on
particular day.
(a) Construct the cumulative frequency table, below 17 oc. Use your curve to estimate the table below. xc ange are summarised in -80 0
the number of weeks when the heating -85 6
taking boundaries .;;;;5, .;;;;10, ... 12
-90
(b) How many people spent between 17 and was turned on. Number of
(d) A week is classified as extremely warm -95 22
27 minutes in the supermarket? 31
(c) 60% of the people spent less than or equal when the weekly maximum is greater Price £x stocks -100
than 21 oc. -105 15
to t minutes. Find t. Use your curve to estimate the percentage of 75<x<;95 6 7
(d) 60% of the people spent longer than -110
weeks that are classified as extremely warm.
(C) 95<x<;100 10 -115 4
s minutes. Finds. 2
100<x<;105 12 -120
(e) Estimate the median.
(f) Find the interquartile range. -125 1
105<x<;110 13
8. The times, to the nearest minute, taken by a group Over 125 0
5. In a quality-control survey, the length of life, in of 120 students to write a particular essay, were 110<x<;120 7
hours, of 50 light bulbs is noted. recorded and are grouped in the table below. 120 <X<; 135 5
The results are summarised in the table.
(The
than interval upl~
' 90'
85 minut:s · d.Jcadte~ alll ti~es greater
0
Using linear interpolation, calculate estimates of minutes.) an me udmg 90
C_on~truct the
d1stnbution cumulative
and dr h fr equency
- table for this
th" draw a cu~u 1atJve
Ten curve. aw t e cumulative frequency From . frequency
(a) the median, (minutes) 40-44 45-49 so-54 ss-59 60-64 curve these figures
and from
(b) the interquartile range. . IS curve esttmate

Frequency 26
Use your curve to estimate a ) t h emed1anti
I(b) . me f or t h e JOUrney
·
Length of life (h) Number 34 30 (a) the ~edian price, t e mterquartile range '
22
bet~:~:S ~:~~ha:-di~~~l~.
8 (b) the mterquartile range
3 of students Ic) the number oft · 'h.
650" h <670
7 (c) :~t;1~~~r of stocks ~osting between £89 second city the
670<:h <680 Construct the cumulative frequency table for this (C) IC Additional)
20 distribution and draw the cumulative frequency
680,;; h <690
17 11. The masses, measured to the
690d <700 curve. 80 eggs were recorded d nearest gram, of 14. Two hundred and fift A
following heights.
.
Y rmy recrmts have the
3 Use your curve to estimate table below. an are grouped in the
700 "h < 702 (a) the interquartile range of the times,
(b) the percentage of these students who spent
over 62 minutes in writing the essay. 65 69 70 79 Height (em) No. of recruits
6. A factory produces a certain component. The Mass (g) 50-59 60 64
masses of 500 of these components were Another group of 30 students wrote the same 165- 18
measured to the nearest gram and are grouped in Number
essay and all took over 65 minutes to 20 y 37
of eggs 18 X 170-
the following table. complete
Use your it.
curve to estimate the median time of all 175- 60
60-69 70-74 75-79 80-84 85-89 Assuming that the read" · 65
150 students. (C) linearly distributed dm?s m each group are 180-
Mass (g) these c s h an gtven that 60% of all 185- 48
Number of 196 53 9. Each of 50 sportsmen was asked to state the calcula~~ th:::l:ctuafl masdses below 66.5 g,
e o x an of y. \C) 190-195 22
93 120 distance, x km, he needs to travel to obtain
components 38
access to suitable training facilities. The results
Without drawing a cumulative frequency curve, are summarised in the table below. s~rength, measur~d i~ ~~el a_:; tested for tensile
12. 30 specimens of she ij~~u~~cda~a in the form of a cumulative
estimate gives the distribution 0 f t hm . The table below
e measurements. (a) the ~e~r:~-h~:;~~c curve to estimate
(a) the 60th percentile, Number of sportsmen
(b) the number of components whose mass is Distance (x km) (b) the lower quartile height
less than 78 grams. (C) 1 Tensile strength Number of specimens The tallest 40% of the re ..
into a special squad E . crmts are to be formed
o<x<4 2 · stJmate
405 415 4
7. The weekly maximum temperatures in a 4<;x<10 Ic) t he median
certain town were recorded, to the nearest 6 415-425 3 (d) the upper ~uartile of the heights f h
degree Celsius, over a period of two years and 10<:x<20 19 425-435 6 members of this squad. o t e
grouped in the following table. 20<:x<35 12 435-445 10
445-455 5
task was o f ~h e ttmds
Number of weeks 35<:x<60 10 15. The distribution ·
455-465 2 certain taken when a
Temperature (°C) 60<:x<100 number of e ter orme by each of a large
8 percentile !a~~; w~s such. that its twentieth
-5 to -1
12
Construct the cumulative frequency table for the d_ts~nbution.
D_ra';' a cumulative fr equency dtagram
. of this was 50 minutes it~~?ut~esh,Its fortie.th percentile
Oto4 distribution and draw the cumulative frequencY 6 . •. x tet percent 1le was
17
5 to 9
btnnate the med"tan and the 10th and 90th
percentiles. 744 :~~~~~~-and tts eightieth percentile was
31 curve. (O&C)
10 to 14 23
15 to 19 9
20 to 24 4
25 to 29
Notice that when distributions are skew, the median generally lies between the mode and the
Number of children mean, and the following relationship is satisfied
Amount raised,·£
. l 'on to estimate (a) the . 70 mean ~· mode ""' 3 im<car
Use linear mterRo ~tl . (b) the upper quartile
1-5
median of t_he ~tstnbut~~~' ercentage of persons 36 One measure of skewness is given by Pearson's coefficient of skewness.
of the distnbuttOn, (c) k. pf t minutes or less. 6-10 19
who performed thetas tn or y 11-15 mean mode
(NEAB) Pearson's coefficient of ske;vncss
1 . ossible amount which ma~
oco

the nearest second, for standard deviation


f a running track State the sma 1 est P h'ld Without drawmg
16. The times, correct to
tOO athletes to cover onhe lap~ the table below. have been raised by one c : e. estimate the
If mean > mode, the skew is positive.
a cumulative frequ~ncy cu v '
were recorded and are s own m
median amount ratsed. ount raised and If mean< mode, the skew is negative.
Number of athletes Also estimate the mea~ ~ml er than the If mean = mode, the skew is zero and the distribution is symmetrical.
Recorded time (s) explain briefly why thts JS arg (c Additional)
0
65-69 median. Alternatively
8
70-74 borehole the thickness, in millimetres, of the
20 J(mcan -·-median)
75-79 18. 1n a · the table Pearson's coefficient of skc\\'11CSS ·"' ·---------·--..---·--·----·-·~------------·--:­
25 lS strata are shown m ·
stancbrd deviation
80-84 31 Number of strata
85-89 Thickness (mm)
10 Generally skewness can take any value between 3 and -3.
90-94 2
6 0- 5 For example, the measure of skewness for these distributions might be as shown:
95-99 20-
h and hence 9
Draw a cumulative freq~ency gra P
. h · t rquarule range.
determme t e m e . · a runner
30-
40-
8
1
_/\-15 ~-003 ~+07 ~+23
To qualify for an athle_ttcs j:e;~~~~onds or so- 0
needs to r~cord a ~apn~.:be~ of athletes who 60-
under. Esttmate t e . time for these
qualified and the medtan (c Additional) U trate these data.
Draw a htstogram to t 'fs ncy table and draw
qualifiers. Construct a cumulatiVe re~u~n Hence or Example 1.45
17. A group o f
125 children raised money for a
. · · The amount
a cumulattve frequency po \g . d the'
otherwtse, estimate the mheoadntaan
charity by sponso.red actl;~~l~:ded These 1 gfortesea· Electric fuses, nominally rated at 30 amperes (30A), are tested by passing a gradually
raised by each cluldhwas rest£ ~re grouped in tnterquartt e ran e f the strata that are less increasing electric current through them and recording the current, X amperes, at which they
amounts, taken tot e nea ' Fmd the proportton o tL)
than 28 mm thick. blow. The results of this test on a sample of 125 such fuses are shown in the following table.
the table below.

Current (x A) Nurilber of fuses

25 <;X< 28 6
28 <;x<29 12
SKEWNESS . 29<;x<30 27
e of various distribunons. 30<;x<31 30
On page 20 you considered the shap . h d of skewness of a distribution: 18
3l<;x<32
. l s of expressrng t e egree
There are mathemattca way . . . In a positively-sl~ewed 32<;x<33 14
In a symmetrical dtstnb~tlOn, distribution the tall o.f the 9
33 <;x<34
ln a negatively-skewed mean= mode = medmn distribution is pulled tn the
positive direction. 34<;x<35 4
distribution the tail o.f the
distribution is pulled m the mode < median < mean 3S<;x<40 5
negative direction.
mean < median < mode Draw a histogram to represent these data.

For this sample calculate

(a) the median current


(h) the mean current '
(c) the standard deviation of current.

Meall
30 A Median 31 A

45 fuses
A measure of the skewness (or asymmetry) of a distribution is given by

75 fuses
3(mean- median)
standard deviation
Calculate the value of this measure of skewness for the above data. (L) Median = 30 + 17.5 - 30.58 ... = 30.6 A (1 d.p.)
30 x 1-

Explain briefly how this skewness is apparent in the shape of your histogram. ~~---~
Mid-point (x) f (b) x ='ifx
6
If
26.5 3861.5
28.5 12
Solution 1.45 Frequency density 27 125
29.5
frequency 30 = 30.892
30.5
Frequency interval width 31.5 18 2 If"
Interval width
2 32.5 14 (c) s =- -x2
Current
6 33.5 9
If
3 12 - 119 905.25
25 <;x<28 12 34.5 4 30.892 2
1 27 25
2S<;x<29 27 37.5 5
1 30 = 4.926 ...
29<;x<30 30
1 18 Lf= 125 s = 2.219
30<;x<31 18
1 l4
31<;x<32 14 [Check these on y our ca1culator, using SD mode.] ···
1 9
32<;x<33 9
1 4
33<;x<34
1
4
1 Therefore the mean is 30.892 A and t h e standard deviation is 2.22 A (2 d.p.)
34<;x<35 5
5
35<;x<40 Now skewness = 3(mean- median)
standard deviation
Histogram to show the current at which fuses blow = 3(30.892- 30.58 ... )
2.219 ...
~

= 0.42 (2 d.p.)
,-
Since
N skewness> 0 ' t h e d'tstnbutwn
. . is positively sk ewed .

.· ote that the resulting .


frequenc . . on IS s ewed to the
ught, t.e. positively skewed y polygon confirms that the distributi . k
20 f--

1----
.--
f-
10

n
0 wr
0 25 28 30 32 34 36 38 40
Current {A)

(a) For grouped


Since data,
there are 125the median is the
observations, theinth value.
median is the 62.5th. This can be found by linear

interpolation as follows:
Since 45 fuses blew at a current less than 30 A and 75 fuses blew at a current less than
31 A, the median lies in the interval, of width 1 A, from 30 A to 31 A.
(Q,- Q2)- (02- Q,)
Quartile coefficient of skewness
Q3-Ql
Quartile coefficient of skewness 13-11
Another measure of skewness is defined in terms of the quartiles. Writing Q 1 for the lower 61-37
~ 0.083 ...
quartile, Q the median and Q 3 the upper quartile,
2 This indicates a positive skew.

wrdth rs 10, frequency density~ frequency '


The frequency distribution is as shown below to .
lO.
7
Quartile coc.Hicicnt of skn\'ncss - -'
gether wrth the histogram. Since each interval

Length (mm) Frequency density


~ 0.8
-~
Positively skewed distribution Negatively skewed distribution 20.;; 1<30 3 0.3 .:'5 0.6
Symmetrical distribution 30d<40 7 0.7 ~
40.;; 1<50 8 0.8 ~ 0.4

50.;; 1<60 5 0.5 ! 0.2


60.;; 1<70 4 0.4
70.;; I< 80 3 0.3 0 20 30 40 50 60 70 80 90
80.;; I< 90 1 0.1 Length (mm)

,=~
.. ~--~
. . sIcew.
The histogram confirms the pos1t1ve
a,
Q,- Q2 < Q,- Q,
Q,-Q,>Q2-Ql
Q3- Q2 ~ Q2- Q, Quartile skewness < 0
Quartile skewness > 0 THE NORMAL DISTRIBUTION
Quartile skewness ~ 0
There is a special symmetrical distribution len
bell-shaped, centred around the mean own as the normal distribution. This is
Example 1.46
31 students tried to estimate the length of a line. The line was actually 60 mm long. These are
Here are two normal distributions wiili th e same mean, but different standard deviations.

-=A~
their results, in millimetres.
61 70 46 44 26 23 30 83 52 44 38
37 49 59 58 63 31 29 37 48 76 61
46 31 38 41 49 52 56 75 61
Find the median and the quartiles of this distribution and use the quartiles to estimate the ~
Th Mean
skewness. ere are two normal distributions with ,the s arne standard deviation but with d'ff
1 erent means.
Draw a histogram with equal class intervals 20 <;I< 30, 30,;; l < 40, ... - ·

' .
Solution 1.46
Arrange the results in order.
23 26 29 30 31 31 37
@ 38 38 41 44 44 46 46
61 61 63 70 75 76 83
@ 49 49 52 52 56 58 59 @
In 3 normal distribution:
There are 31 results, so the median, Q 2 , is the :H31 + 1)th value, i.e. the 16th value.
So median~ 48.
To find the quartiles, since n is odd (see page 69)
Q ~ ~(31 + 1)th values~ 8th value~ 37
1 s x x+s x2s x X+ 2s x3s x X+ 3s
Q = ~(31 + 1)th values~ 24th value~ 61
X

3 Approximately 68% of Approximately 95% of Over 99% of the


Now Q 3 - Q 2 ~ 61 - 48 ~ 13 the drstribution lies the distribution lies within distribution (nearly all!)
Q2- Q, ~ 48- 37 ~ 11 Withm one standard two standard deviations hes wrthin three standard
deviation (s) of the mean. (2s) of the mean. deviations (3s) of the mean.
Since Q - Q > Q - Q 1 the distribution is positively skewed.
3 2 2
90 I\ Construct (b)
these data.a suitable ptctonal
. . representation of

State
mean the
andmodal val ud a~d_calculate the median
standard
cheques with errors l·nevtatwnkof the number of'
The quartiles are a wee

asymmetry) of a dist .b ~he skewness {or


approximately Some textbooks measure . (c)
1; x standard deviation n utton by
either side of the mean. 3(mean- median)
standard deviation
The normal distribution is studied in greater detail in Chapter 7. and others measure it by
(mean- mode) 8. The following table gives
60 students. . the blood pressure of
standard deviation
Blood pressure Frequency
measures of skewn~=r~ thilivalues of these two -
Calculate and com
how this skewness is srefl:ct eda_bovhe data. State 95- 2
your graph. e m t e shape of 105- 5
6 · (AEB) 110- 6
4. The following table shows the time, to the
distnbution represented of s~ewness for the
J\ nearest minute, spent reading during a particular . F:nd_Pearson's coefficient 115- 9
1. Calculate Pearson's coefficient of skewness for day by a group of school children. plot, which gives k ?Y
thts stem and leaf
mar s m an examination
120- 14
the following frequency distributions where 125- 3
Stern Leaf .

skewness- _
mean-mode
_c_cc'-'-ccc'-~'-'c­
Time
Number of children

8
i i 8 I
Key 3[7 means 37 I 130-
135-
6
5
standard deviation 10-19 15 3 3 7 140- 4
20-24 25 4 5 150-180 6
(a) 25-29 5 2 5 5 7
18 6 1 1 6 6 8 8 8 (a) Find
30-39
14. 12 7 3 5 5 (i)
ii) Pearson's coefficient of sl
40-49
i 12 50-64
7 8
9
2 9
1
(
(b) D
the quartile coeff . <ewness
tctent of skewness
£ 10 65-89
5 raw the histogram. .

8 9. The following grou ed f. .


\a) Represent these data by a histogram. summarises the timp rhquency dtstribution
6 (b) Comment on the shape of the distribution.(L) waiting by a sampl:'o~o t ~ nea~est minute, spent
surgery. patients m a doctor's
4
2 '\
\
\ 5. Over a period of four years a bank keeps a
~ Waiting time
20 21 ' weekly record of the number of cheques with
0 16 17 18 19 Number of patients
12 13 14 15 errors that are presented for payment. The {to the nearest minute)
results for the 200 accounting weeks are as
3 or less 6
(b) X f follows. 4-6 15
Number of
20 2 Number of cheques 7-8 27
weeks (f)
21 1 with errors (x) 9 49
22 4 5 10 52
0 22
23 5 11-12 29
7 1 46 13
24 13-15
The~~~~-----------------
8 2 38
25 16 or more 9
4 3 31
:bth the cumula~~~e ~;~ency curves associated
26 se are the th f
st(a)ndar_d deviation was 3a~j-6~ minutes and the
1 4 23 The mean of the ttmes w
27 < ove. Label each f uency curves A B C
5 16 appropriate lett requency curve with' th~ a Usmg interpolation . . mmutes.
and semi-interqua ;-tsttmate the median
6
Semi-interquartile :: e r~?e of these data.
2. For a skewed distribution, the roean is 16, the 11 (a) . er.
median is 20 and the standard deviation is 5.
Calcubte Pearson's coefficient of skewness and
7
8
6 (\\
I
range + 2 nge - mterquartile
2
sketch the curve. 9 I \ For _a_normal distributio f
semHnterquartile .
.
n o the ratto of the
/ \.'-.. deviation would b range to ~he standard
3. For a skewed distribution, the mean is 86, the (1: fx ~ 706, 1: fx' ~ 3280) e approximately 0.67.
mode is 78 and the variance is 16.
Calculate Pearson's coefficient of skewness and
sketch the curve.
Girls' marks

r
(b)
(b) Calculate the corresponding value for the
above data. Comment on your result.
i 100

For a normal distribution, 90% of times would I 80


~
be expected to lie in the interval ~ """ """ .""' ·-·
(mean± 1.645 standard deviations).
(c) Find the theoretical limits for these data. i 60
(d) Using appropriate percentiles, estimate ~ i
comparable limits. Comment on your result. (L) ~ I
2 8 40 I
I
10. Calculate the quartile coefficient of skewness for I I
20 I I I
each of the following distributions: I I
I
I I I
(a)
10 14 18 22 30 40 '
0
o,t dO,
0 02 80
0 20 40 60 100
(c) Stem o oi<ey
Lea£ [ 5 \ 1 means 6 \ Mark
2
5
0 1 1 ------__J
8
11
1 1 2 2 2
001222
1 1 2 2
I-~-
-CTI----·-1
0
Box plot for girls' marks

14
17 0 1 Th b ' op,
20 1 e ox plots can be drawn h onzontally
. ' as sh own above, or vertically
. like th'!S.

Boys' marks Girls' marks


40 50 60 70 '
30
0 10 ~ 100
iii

BOX AND WHISKER DIAGRAMS (BOX PLOTS)


80
Consider the cumulative percentage frequency curves for girls' and boys' marks, drawn on
1 3
page are shown below, together with the median Q 2 and quartiles Q
These66. , Q •
60
Below each diagram is a box and whisker diagram, or box plot.
Boys' marks

40

20

60

\ 40
------
0

20 i
'
i03
80 100
0 40 60
0 Mark

Box plot for boys' marks


For a negatively skewed distribution:
. . spread of the
l
illustrates the dtsperston, or d Q ) and
The box and whisker diagram, or ~~~Je~:'values of the data, the quartiles (Q 1 au J

distribution. It uses the htghest an


the median (Q2). For example:

Vertically
...---- Highest value

The left-hand whisker is


. h 'box' extends from Q 1 to Q 3
Nonce that t e .ddl SO% of the data. longer and the median is
and so encloses the IDl e nearer to the upper quartile.
4----- Upper quartile Q3
df the box to the
The 'whiskers' exten 1 ro:nd illustrate the
-or- Median highest and lowest va ues
-4:---- Lower quartile 01 range of the data. Example 1.47

A class of pupils played a computer game which tested how quickly they reacted to a visual
instruction to press a particular key. The computer measured their reaction times in tenths of
a second and stored a record of the sex and reaction time of each pupil. Finally it displayed
-4;---- Lowest value the following summary statistics for the whole class.

Horizontally Lower Upper


Median quartile quartile Min Max
6 19
tt t
Q 1 Q2 Q3
Highest value
Girls
Boys
10
10
8
7
15
13 4 16
Lowest value
. l distribution would look like this: (a) Draw two box plots suitable for comparing the reaction times of boys and girls.
A box plot for a symmetnca (b) Write a brief comparison of the performance of boys and girls in this game. (NEAB)

Solution 1.47

(a) Q, Q, a,
Girls

Q, Q, Q,
The whiskers are of eq':'al Boys

length and the median ts m


the middle of the box. 4 5 9 10 11 12 13 14 15 16 17 18 19
6 7 8
Reaction times (0.1 s)

For a positively skewed distribution:


(b) The 'typical' (median) reaction time for boys and girls is the same (10 x 0.1 ~ 1 second).
However the times for the boys are more evenly distributed, with a smaller range. There is
\ a bigger spread of times for girls and their distribution is positively skewed.

In general, the boys have the faster reaction time.

The right-hand whisker is


longer and the medtan ts .
e lower quarttle.
nearer t o th
Example 1.49

8 f 50 ackets. The1r A group of athletes frequently run round a cross-country course in training. The box and
Example 1 ·4 f eets in each 0
P whisker plots below represent the times taken by athletes A, B, C and D to complete the
of the numbers o sw
out a survey .
A group of chlldren carrLI\owmg stem and leaf d>a•tg~r:a:m:·------:;-;~=::;in-;;~~~ course.
results are shown m the -Key 20\4 means 24 sweets in a packet. A--_.j

9
778899 9 4 4 4 8-----------------
3
20 14 6 6 7 7 1 1 2 2 2 3 8 9 9
30 0 0 0 0 ~ 7 7 7 8 8 8 8
30 5 5 5 6 4 4 c-----
0 1 1 2 . . (NEAB)
40 0 . f this distribunon.
1
d' n and the quartl es o
(a) Calculate the ~e ::r the distribution.
(b) Draw a box pot

27 28 29 30 31 32 33 34 35
Solution 1.48 . ' 50+ 1 )th value, i.e. the 25 .5th value. Time (minutes)
. the median lS the 2( h' h are 33 and 34.
(a) There are 50 ltems, so h 25th and 26th value w lC (a) Compare the times taken by athletes C and D.
This is half-way between t e
. n Q = 33.5 sweets. Assume that the distributions shown above are representative of the times the athletes would
So me d1a , 2 take in a race over the same course.
(13 values) (b) Which of the athletes A orB would you choose if you were asked to select one of them to
(15 values) win a race against
7 7 8 8 9 9
20
30
14 6 6 7 6
1 1 2 2 2 3
0 0 0 0 6 7 7 7 @) 8 8
4 4
9 9
(15 values)
(7 values) (i)
(ii)
c
D?
5 6
30 5 5 4 4 T
40 0 0 1 1 2 Q, Q2 Give a reason for each answer.
I . 0 = 29 sweets.
0 so Q 1 is the 13th va ue, l.e. ~Ql - 38 sweets. (c) Which athlete would be most likely to win a race between A and B? (AEB)
items to the left o f ~2' Q . the 38th value, l.e. 3-
There are 25 · h f Q so 3 lS d.
There are 25 items to the ng t o 2• d' 'b tions either side of the me >an. Solution 1.49
. 'd . half the two lstn u
th at the quartiles dlVl e m (a) Dis always faster than C.
Remem b er
. easy to see: C's times are more variable than D's.
Note that the pattern lS 12 items C's times are positively skewed.
a,
12 items D's times are negatively skewed.
12 items
12 items t 34 35
38
38 38. . 44

(b) (i) B's median average time is faster than C's, but B's tin~es are more variable. It is
30 30 . . 33
24 26. . 29 probable that B would win against C.
Although A's slowest time of approximately 32 minutes appears to be slightly greater
than C's fastest time. A will almost certainly win against C.
(b) Box plot Therefore choose A to win a race against C.
44
a, (ii) A has a small chance of winning against D, but B has a slightly greater chance of
va\u·-· a, t winning against D.
Therefore choose B to win against D.

==:
ic) 1\'s average time is faster than B's and A's times are not as variable as B's.
1 hcrcfore choose A to win a race between A and B.
40
35
25
20
temperature.).;~; :~:e
It wron 1that. the .te ~perature recorded as 94 op .
would appear
recorded
It IS most
perature unusual
of 57 to have.
op how
.
just~~:~
outhehr.
ay Wit an(It was probably
extremely high
' ever lS not an outli

outliers labelled as an outlier, as s~=n~o 57 op and up to 81 op and the temperature of 94 op is


The whiskers are drawn d er.
Sometimes unusually high or low values occur in a set of data.
There may be good reason for these unusual results but quite often they occur because an
error was made when the data were recorded. Boxplot to show temperatures
To investigate extreme values you would use the mean and standard deviation, or the
/Outlier (94)
quartiles and interquartile range (IQR).

As a guide, the term 'outlier' can be applied to data which are
(a) at least 2 standard deviations from the mean, i.e. less than x- 2s or greater than X+ 2s, or
60 70 80 90 100
50
(b) at least 11 x interquartile range beyond the nearer quartile.
Interquartile range~ Q 3
- Q 1 , so illustrating on a diagram gives:
outliers
(b) Using calculator in SD mode:
I

. - - - - 1.5 X (Q3- Ql) _ __,..

I
I
I
I
I
0. X~ 71, and
x-2s~56.6,
5 ~ 7.J1, so
x+2s=85.3
I
I Q, II

~..............
I Since outliers lie outside these values 57 op . ' 1s an outher.
I
I
Boundary
. ....... ' lS not an outlier but 94 op . .
I
I
I
Boundary

Example 1.50
A class of 31 children recorded the maximum daily temperature for the month of July with
the following results. The median and quartiles are shown on the stem and leaf diagram.
Key 6 \ 8 means 68°F l J 1. The table below gives the length .
50 telephone calls from a sch ools,office.
.
m mmutes, of Key 4 12 means Key 214 means
24 hundredths 24 hundredths
9 4 of a second
of a second
8 Length of Number of
8 1 1 call (min) calls Group 1 Group 2
7 7 7 9 9 3 3 3 G) 6 6 2
2 2 2
7 0 0 @ 0 <;1 8 5 4 4 2 4 5
2 2 2 3 3
6 (6) 8 8 8 9 9 1-2 11 333322
11000
2
2 0 0 1
4 4
6 1 3 4 4 2-3 17 8 1 8 9 9 9 9
5 7 3-5 8 7 6 6 1 6 6 7 7
5-10 6 5 4 4 1 4 4
Identify any outliers ;;.1o 0 1 2
1
(a) by using the values of the quartiles and illustrating your results on a boxplot, 0 9
(b) by using the mean and the standard deviation. (a) Draw a cumulative fr
(b) Estimate the med' eq~ency polygon.
(c) Draw a box lot lan an the quartiles. 3. !wenty-one
m millimetresgirls
Thestimated
I t h e Iength of a line
distribution. P and comment on the · e resu ts were '

Solution 1.50
(a) The values ringed are 66, 70 and 73, so Q 1 ~ 66 op, Q2 ~ 70 op, Q
3
~ 73 op, 2. ,I."wo groups of e
tnning experim~n ople t~:>Ok part in a reaction-
51
85 45
62 31 43 51
20 22 97 ~~ 4198 23
22 34 35 35
18 27
,~
hundredth o f a second
t. Theu results
h ' to the nearest
.
lnterquartile range~ Q 3 - Q 1 ~ 73°- 66° ~ 7 op ~~;l:r~e box plot and use it to identify any
~ Q 3 + 1.5 x T ~ 73o + 10.5" ~ 83.5"
'Jonstruct b ' are s own below
Upper boundary distribution~)X 1 to represent the
pdots .
Lower boundary ~ Q -1.5 x T ~ 66°- 10.5" ~ 55.5°
1
' an comment.

Outliers therefore lie outside the interval55.5° to 83.5°.


LOO t'\ "(Jl\'Clc:.r:
~' II '-~--
COURSE. \N /\--U:.VEL-
11. In a test on the protein quality of a new strain of
9. Draw box plots to represent the following
l
d a jigsaw in the frequency distributions. corn, a farmer fed 20 new born chicks with the
5. Thirty-one peopl(~ co~p etetes) new corn and observed how much weight they
f olygons and then following times m mmu . (a) gained after three weeks. The results are given
1 2 3 4
4. Draw cumulativedreqh~~ncydfagrams to represent 48 49 39 87 73 23 120 X 0
below.
construct box an w lS zer 11 53 i~ ~~ 67 86 79 65 47 36 133 f 4 12 6 2 1 Weight gain (grams)
the following histograms. Q d Q _Q 24 ~i 70 75 53 42 42 72 144
For each one, calculate QJ- 2 an 2 J· . 360,445,403,376,434,402,397,425,407,369
78
What do you notice? . nd standard deviation, identify 462,399,427,420,410,391,430,369,410,397
Usmg the mean a
Hint: remember that any outliers. (b) f (a) Make an ordered stem and leaf display of
....
14 I i I these data.
frequency The box plots show the dis_tributld.o~s of marks +...... .. The farmer also fed a further 20 new-born chicles
frequency density class width 12 I . .... . ...

6. db l in Enghsh an m
obtaine Y a c ass
Mathematics. Comment on
'the distributions of
10 , ......
on the standard strain of corn he had previously
used and he recorded their weight gains after
i '' 8 I I ... i ..... three weeks. The results for this control group
i marks.
••''
L
I
(a) 3 are given in the ordered stem and leaf display in
-~ L I i English 6 .. . .....
I ....

i the table below.


~ •'''
marks
4 I I ....
I······
- '. ••''
c

i 2 l
( ....

•''
\
i\l--- 2
I
-\········ I I
Weight gain (grams)
Unit is 1 gram
~ \ --L___j____J
Mathematics

1
'•''

' marks 0
2 3 1 ? 7 X
32 1 5
I 33
( ( 70 80 10. A frequency diagram for a set of data is shown
'

50 50 60 34 5 9
0 30 40 30 40 below.
10 20 20
0 Lengtll (em) 35 0 6 6
Key 6\5 means 36 0 1 6
7. Key 4\6 means 6.5 hours 6 7
6.4 hours ~ 6 37 1 2 2 3
(bl 0.6 December
july ,!5 38 0 6
-~ 0 1 4 39 9
12
~ 11 1 3 3 4 3 40 2
i 0.4 10
9 2 2 8 8 41
~ 8 : +-r--+1--+-+-'i--+--+-+--HII-+-1-,---,-,--+1 42 1 3
0.2 7 3 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
420 6 55666 (b) On a single diagram draw two box-and-
{a) Find the median and the mode of the data. whisker plots, one for the weight gains of
1 5 ooz84 {b) Given that the mean is 5.95 and the
0 80 100 the chicks fed the new strain of corn and the
40 60 9 1 4 1 3 standard deviation is 2.58, explain why the
0 20 other for the weight gains of the control
Length (em) 1 3 5 5 value 15 may be regarded as an outlier. group fed the standard strain of corn.
764433 2 6 (c) Explain how you would treat the outlier if (c) Use your box and whisker plots to compare
9 8 8 6 0 0 1 1 2 the diagram represents and contrast the two distributions. (0)
887332000000 0 34 (i) the ages {in completed years) of
· daily hours of children at a party, 12. A random sample of 51 people was asked to
This back-to-bade stempl at glVes (ii) the sums of the scores obtained when record the number of miles they travelled by car
sunshine in December and July throwing a pair of dice. in a given week. The distances, to the nearest
{d) Find the median and the mode of the data mile, are shown below.
Find the median and quartiles for each month
after the outlier is removed.
and construct the box plots. {e) Without doing any calculations state what 67 76 85 42 93 48 93 46 52
Comment on the distributions. effect, if any, removing the outlier would n 77 53 41 48 86 78 56 80
have on the mean and on the standard 70 70 66 62 54 85 60 58 43
58 74 44 52 74 52 82 78 47
These are the times of t~e pastil .delivery to my deviation.
66 50 67 87 78 86 94 63 72
(f) Docs ~he diagram exhibit positive skewness,
8. house over four successive wee cs. 63 44 47 57 68 81
negative skewness or no skewness?
9o01 9o22 9o30 9o19 9o15 929 How is the skewness affected by removing (a) Construct a stem and leaf diagram to
9A5 9o53 9o02 9o05 9o31 9A7 the outlier? (MEI) represent these data.
9A8 9o29 9o09 9o29 9o02 (b) Find the median and the quartiles of this
~~i6 9o12 9o25 9o10 9o13 9o19 distribution.
(c) Draw a box plot to represent these data.
Draw a stem and leaf diagram. (d) Give one advantage of using
(a)
(b)
Find the median time. (i) a stem and leaf diagram,
Find the quartiles. .
(c) Draw a box and whisker dJagram. (iii a box plot,
(d) to illustrate data such as that given above. (L)
103

Pie charts
Area = frequency
C A
To compare sets of data with total
frequencies Fl and F2, draw circles with
Vertical line graphs- ungrouped discrete data
B radii in the ratio ·'f.·r;;-F
'H't· 'IJ:i2.

radius r1 radius r2
Height represents frequency.
Mode is denoted by the tallest line. Mean, .X
LX
Raw data X=- When data are grouped
n the mid-interval value '
_ L(x 1 1ower boundary+ upper
2( ' boundary)
Frequency distributions x=-- IS taken to represent the interval.
Lf
Stem and leaf diagrams (stemplot) Standard deviation 5
'
l Key 2\7 means 27 J Raw data s= JL(x:x)2 or s~ ~-x2
s~
2
Leaf = JL((x- x) or JLZ2 -xl
Stem The stem plot must have a key. Frequency distributions 5
L(
1 0 4 5 9 Intervals are 10-19,20-29,30-39,40-49,
variance = 52
2 2 2 3 5 6 7 7
S0-59 s
~--

="'variance
3 1 2 2 7 8 Equal width intervals must be chosen.
4 3 3 4 6 Scaling data
2 3 7 If y ~ax+ b, where a and b are constants
5
then y=ax+ band sy~ Ia I sx
Histograms - grouped data Area oc frequency Coding data
frequency
x-a
Frequency density interval width If y=- then x=a+by
b
,- Modal class is represented by tallest x~a+by

r- rectangle. sx=lbls,
1- Interval width =
I upper class boundary -lower class boundaty. Combining sets of numbers, x and y
' new mean Lx + Ly
n1 +nl
Frequency polygons - grouped data LX 2 +Ly 2
new variance (new mean) 2
n1 +nz
Plot frequency density against the Weighted means
mid-interval value Ifx 1' x z, ... , xn are given
. weightings w 1' w z, ... ,W 11 t hen
Join with straight lines
weighted mean- w1x1 + WzXz + ... + wnxn L.wixi
Wt+Wz+···+Wn Lwi
"' Skewness

" Cumulative frequency is the total frequency up to a particular observation.


In a symmetrical distribution
(b) Grouped data- cumulative '
frequency curve or polygon. mean =mode =median
(a) Ungrouped data- step diagram.
Q3- Qz = Qz- Q 1

Mean
Mode
Median

In a positively-skewed
distribution the tail of the
d!stnbution
. . is pulled in the
posttlve direction.
mode < median <mean
., plot cumulative frequency against
., steepest step denotes the Q3- Qz > Q 2 - Q 1
upper class boundary.
mode. ., join with a curve (or with straight
lines for a cumulative frequency
In a negatively-skewed
polygon). distribution the tail of the
distribution
. is pulled in the
negative direction.
" Median, quartiles and percentiles mean < median< mode
For n observations arranged in order of size Q3- Qz < Q 2 - QI
" the median Q is the value 50% of the way through the distribution,
2
., the lower quartile, Q , is the value 25% of the way through the distribution,
1
<> the upper quartile, Q , is the value 7 5% of the way through the distribution, -:::--~....,.._:.::"':::::__~
mean- mode _ J( mean- median)
3 Pearson's coefficient of skewness
standard deviation - standard deviatlon . . .
<> the xth percentile, Px, is the value x% of the way through the distribution.
Grouped data - (Q3-Qz)-(Q 2 -0 1)
Ungrouped date Quartile coefficient of skewne·ss
Q3-QI
!nth value= SO% value
\c(n + l)th value
*nth value= 25% value ., Box and w h.IS ker diagrams (boxplots)
Divides the distribution
either side of the -lnth value = 75% value
Q3 \
median in half Lowest
Value
I Highest
value
Symmetrical distribution

Positively skewed distribution


., Ranges
Range= highest value -lowest value
3 1
Interquartile range= upper quartile -lower quartile= Q - Q Negatively skewed distribution

Middle 80% of readings= P90 - P10


106 ,i\ An outlier here is defined as any observation less than Q 1 -1.5(Q 3 - Q 1) or greater than
Q + 1.5(Q 3 - Q 1 ), where Q 1 is the lower quartile and Q 3 is the upper quartile.
3

(c) Identify any outliers in the data.


(d) Illustrate the data by a box and whisker plot. Outliers, if any, should each be denoted by a
Outliers- as a rough guide:
'*' and should not be included in the whiskers. (AEB)
dard deviations from the rnean.
® Points at least two stan b Q or below Q 1·
. h 1 5 times the interquartile range a ove 3 Solution 1.52
Points lymg rnore t an .
Arranging the data in order
48 49 59 59 [§] 63 66 67 70 Ill] 74 74 77 81 llill 86 106 165 229
Miscellaneous worked examples Q, Q2 Q,
(a) Q 2 ~ !(n + l)th item~ ~(19 + l)th item~ lOth item~ 72 seconds
· days were
Example 1. 51 400 rnetres on ten successiVe (b) Ql ~ 61, Q, ~ 82
k b an athlete to run
The times t (in seconds) ta en y 54 0 53.7, 59.3, 53.8. (c) Q 3 - Q 1 ~ 82- 61 ~ 21
53 2 55.7, 54.2, 52.7, 53.6, 56.8, . ' Q,- 1.5(Q,- Q,) ~ 61- 1.5 X 21 ~ 29.5
. ' -5470 "2:12~29957.48.] Q 3 + 1.5(Q3- Q,) ~ 82 + 1.5 X 21 ~ 113.5
useL.t- ·'
[If required, you rnay So outliers are less than 29.5 or greater than 113.5
h of the times. (C) Therefore the outliers are 165 and 229.
(a) Calculate t e rnea~ d deviation of the times.
(b) Calculate the stan d'ar of t h e urnes.
. (d) Box and whisker plot to show times taken to complete the task.
(c) Determine the rne !an

* *
Solution 1.51
_ "l:t_~~54.7seconds 40 60 80 100 120 140 160 180 200 220 240
0 20
(a) meant~-;- 10
_
754
_2_9_9-57-.-c-48~_-___2 ~ -13.658
~ 1.9 seconds (2 s.f.) Time in seconds

2
(b)s~ - 10
. . h 1 (10 + 1)th value~ 5.5th value.
the med1an IS t e 2 Example 1.53
1
(c) There are 10 vah ues,
. so . order of slze . Whig and Penn, solicitors, monitored the time spent on consultations with a random sample
Re-arranging t e umes m 5 7 56 8 59 3 of 120 of their clients. The times, to the nearest minute, are summarised in the following
52.7, 53.2, 53.6, 53.7, 53.8,_\54.0, 54.2, 5 . ' . ' . table.
5 .')th Y\1\W.' Time Number of clients
53 8 and 54.0, i.e.
The median is half way between . 10-14 2
5
. _ ( + 54 _0) ~ 53.9 seconds 15-19
med1an- 12 53 ·8 20-24 17
25-29 33
30-34 27
Example 1.52 f nual dexterity. The times, in 35-44 25
. b re required to take a test o rna
Applicants for an assembly JO a k b 19 applicants were as follows: 45-59 7
lete the tas y 6 60-89 3
seconds, taken to cornP 81 72 59 74 61 82, 48, 70, 8 . 1
4 67 59 66 102 ' ' ' ' ' 90-119
63,229,165, 77,49, 7 ' ' ' ' ' 120
Total
For these data find
(a) By calculation, obtain estimates of the median and quartiles of this distribution.
(a) the median, 1 W er quart!.1es. (b) Comment on the skewness of the distribution.
(b) the upper an d °
34.5 44.5
Q,lies in the interval 34.5-44.5 (width 10)
(c) Explain briefly wby these data are consistent with the distribution of times you might There are 25 it ems In
. t h.IS mterval
. .

expect in this situation. 90 so Q, ~ 34·5 + fs x 10 ~ 36.9 min'


(d) Calculate estimates of the mean and variance of the population of times from which these

(b) Q,- Q2 ~ 6.9 min 0 _ Q _ 4 6


data were obtained. So Q _ Q ' ~2 1- · rnm
The solicitors are undecided whether to use the median and quartiles, or the mean and (c) V 3 2 > Q2- Q ,. This implies a positive skew.
standard deviation to summarise these data. ery few consultations take over h
Most take just under half an h our.an our.
(e) State, giving a reason, which you would recommend them to use.
(d)
(f) Given that the least time spent with a client was 12 minutes and the longest time was 116 Mid-pointx f Using the calculator:
minutes, draw a box plot to represent these data. Use graph paper and show your scale
12 2
x~
2 ~ 160.7 (1
32.6 (1 d.p.)
17 5 s d.p.)
clearly.
Law and Court, another group of solicitors monitored the times spent with a random sample 22 17
of their clients. They found that the least time spent with a client was 20 minutes, the longest 33
(e) It is better to use the m ed.Ian and q d
27
b ecause the distribut·wn 1s
. s l<ewed. uar I es
time was 40 minutes and the quartiles were 24, 30 and 36 minutes respectively. 32 27
39.5 25
(g) Using the same graph paper and the same scale draw a box plot to represent these data.
(L) 52 7
74.5 3
(h) Compare and contrast the two box plots. 1
104.5

Solution 1.53
Cumulative
(g)
1Whif and Pe~n
1

!Law:and--couri
i
1
1 i 11 I·
. ..

For grouped continuous data, with n ~ 120


(a)
i • I
Frequency 1 0
Time 1:o 40 50 ~0 8.0 9.0

2 Q is the *nth value, i.e. the 30th value


<14.5 1
7 Q is the inth value, i.e. the 60th value
< 19.5 2
24 Q is the inth value, i.e. the 90th value
<24.5 3 (h) Both have a small.mterquartJle
. range i.e . j
. small . b.l. [
57
<29.5 Both h · vana 11ty.
84 ave the same median (30 mm.
. )
< 34.5
109
----~
<44.5 Whig
116 appearand Penn
more times have a much greater range of values so the L
efficient . ' aw and Court Would
<59.5
119
< 89.5
120
< 119.5
29.5 Q lies in the interval24.5-29.5 (width 5).
1
There are 33 items in this interval,
24 so Q ~24.5 + j'1 x 5 ~25.4 min
----------~
30
1
57

34.5 Q lies in the interval29.5-34.5 (width 5).


29.5 Q,
2
There are 27 items in this interval,
__________57
60
_,_ so Q
2
~ 29.5 + 17 x 5 ~ 30 min
84
110 ii
6. A school entered 88 student f . . Daily expenditure (£) Frequency
The results of thee . .s or an exammatton.
table below. xammatlOn are shown in the
0.66-0.90 11
Miscellaneous exercise ln (b) Another set of 10 numbers is such that their
sum is 130 and the sum of their squares Mark (x) Frequency
0.91-1.15 28
1.16-1.30 38
1. In 1798 the English scientist Henry Cavendish is 2380. This set is combined with the
measured the specific gravity of the earth by 0 <X~ 10 3 1.31-1.45 34
original20 numbers. Calculate the mean
careful work with a torsion balance. He obtained and standard deviation of all 30 numbers. 10<x~20 6 1.46-1.70 27
(C Additional)
the 29 measurements given below. 20<x.;;30 9 1.71-2.00 12
5.29 5.30 30<x~40 10
4.88 5.10 5.26 5.27 5.29 5.46 5.47 4. A grouped frequency distribution of the ages of
4.07 5.39 5.42 5.44 358 employees in a factory is shown in the table 40<x.;;50 12 8. The hourly wages, £x, of the 15 .
5.34 5.34 5.36 5.57 5.58 5.61 5.62 5.63 below. Estimate, to the nearest month, the mean 18
small factory are as follows: workers m a
50<x~60
5.50 5.53 5.55 5.85 5.86 and the standard deviation of the ages of these
5.65 5.75 5.79 60<x.;;70 14 £6.60,
£7 25 £3.40, £6.45, £52
. 0, £3.60,
employees. 70<x.;;80 11 £5.7 , £9.60, £3.75, £4.20, £8.75,
The sum of these measurements is 157.17 and Graphically, or otherwise, estimate
(a) the median and the interquartile range of the 80<x.;;90 5 . 5, £4.50, £3.95, £4.75, £12.25.
the sum of their squares is 855.0227.
(a) Calculate the mean measurement and the ages, each to the nearest month, (a) ~~ustrate th~ data in a stem and leaf
standard deviation of the measurements. (b) the percentage, to one decimal place, of the (a) Calculate, showing your working a d .. tagram, usmg pounds for the stem and
Obtain for these data the range, the median employees who are over 27 years old and your answers correct to tw 0 1 . nlgltvmg
and the quartiles. under 55 years old. (L)
an estimate of c ectma p aces,
m~dian
wage. State .the ran~;n
pence for the leaves Clearl . d'
reate the
Draw a box plot and use it to identify the Number of (i) the mean mark (b) Grvcn that :Ex~ 90.00 and ~2 ~ 631 25
outlier in Cavendish's data set.
(b) If the data were analysed without this
Age
(last birthday)
employees (ii) the variance '
(iii) the standard deviation.
~~l~ulat~ the mean and standard devi~ti~n
our y wages of the workers.
outlier, calculate the new values of 36 (b) Copy an.d complete the followin
f nego~tahttons,
16-20 After
ti) the median 56 cumulattve frequency table. g offereddelicate wage
a choice . . the workers are
21-25 . o one o t e following pay rises:
(ii) the mean 58
(iii) the standard deviation. (NEAB) 26-30
52 Mark (x) Cumulative frequency (A) an mcrease of 30 pence h
31-35 (B) 5% · . per our
46 a o nse m hourly rates. '
2. The table shows the distribution of the lifetimes 36-40 x< 10
38
(measured to the nearest hour) of a sample of 41-45 (c) Use your answers in part (b) to deduce th
batteries.
46-50
36
36
x<20 mean and standard deviation of the h t"
51-60 x<30 wahges of the 15 workers under both our y
0 x<40 sc emes.
Frequency 61-
Lifetime {to nearest hour) x.;; 50 (d) :~~~h%tr t~e manage~ent would not
3 5. 200 candidates sat an examination and the sc erne was tmpleme t d b
690-709 distribution was obtained as shown in the table. x.;; 60 t h e workers might . n e ,(MEI)
ut
7
710-719 (a) If the limits of class 40-49 are 39.5 to 49.5, x< 70
15
no-729 38
what is the mid-interval value of this class? x< 80 9. The table below shows the len th d' .
730-739 (b) Calculate the mean of the marks explaining pebbles from th b d f . g tstnbution of
41 x< 90 e c o a nvcr.
any limitations of your calculation.
740-744 35 (c) Plot a cumulative frequency curve and use it
745-749 to estimate the upper and lower quartiles. (c) Usmg
h :, 2 em l to. represent 10 marks on the Length, x millimetres Frequency
21 onzonta axts and 2 em to represent
750-754 {d) Assuming that your estimates are exact, find 10 st d
16 ~ ents on the vertical axis draw o
freq~enc
755-759 values for a and b correct to two significant O<x<5 10
14 grap paper, a cumulative ' n
760-769 figures, in order that the above marks can be 5 <x< 10 8
10 polyg:m t_o illustrate the distributi y f h
scaled by the equation y =ax+ b, where y is 12
770-789 d cxammatlon marks on o t e 10~x<20
the new mark, so that the mean becomes 45
( ) Use') your graph t o esttmate
· . 20<x<50 25
and the lower quartile becomes 35.
{a) Draw a histogram to represent the data. (:. the median mark so.;;x<100 30
(e) State, with reason, whether the quartiles of
{b) Draw a cumulative frequency polygon. the original marks will scale into the
Th {n) the interquartile ;ange.
(c) Calculate the mean and standard deviation. theeexamination . d to o btain a grade A in
owest mark req u7rr5e
< 5 .thleOfreqt~ency density for
quartiles of the scaled marks. (a) You are given
(d) Estimate the median and the quartiles. (e) E . was . the class 0 < x that
{e) Calculate Pearson's coefficient of skewness. Frequency stnnate wh
students from your graph the number of f · IS · · Wnte down the
(f) Calculate the quartile coefficient of
Marks (x)
this examin~:~~e awarded a grade A for (b) ~equency densities for the other clas
10 (C) (c) epresent the data in a histogram ses.
skewness. 10-19
18 Us~ your histogram, or otherwise . to
d) ~tnlnalte the ~odallcngth of a pebble.
(g) Draw a box and whisker diagram to 20-29
20 7. Dat,\ were collected f
illustrate the distribution. 30-39 students m a ll rom a survey of 150
30 expenditure oc~ e~ ~anteen. Their datly ( a cu ate estimates of
40-49
49 the table oppost:~ - ay meals IS summansed Ill
(~! the mean length of a pebble
3. The sum of 20 numbers is 320 and the sum of 50-59 (u) the median length of a pebble. (0)
their squares is 5840. Calculate the mean of the 46
20 numbers and the standard deviation.
60-69
20
Illustrate
hcquencythegra~h
dat aHy
b means of a cumulative
70-79 value of the d l. ence estnnate the medtan
{a) Another number is added to these 20 so that 5 (C)
80-89 at Y expenditure.
the mean is unchanged. Show that the 2
standard deviation is decreased. 90-99
Draw, on graph paper ' a 1ltstogram
.
t~c diagram represents regard the outlier if
(d) Explain how you would these data. to represent

( l the
whendifference
th . of th e scores obtained Estimate
Calculate the mean mark for all 50 candidates rowmg a pa' f r
and show that the standard deviation of all (B) ~he number of child:~no or~mary dice, ~~~ t~c mea~l of the distribution
10. An advertising campaign to promote electric m a neighbourh d per ousehold less than 50age of the a PP l"tcants
t e medtan ' who are
50 marks is 9. years of age . (C)
showers consists of a mailshot which includes a lt is suggested that the original marks of the (e) Calculate new valu ~o survey.
pre-paid postcard requesting further details. candidates from Set A should be linearly scaled standard deviation~£ thr t?e mlcan and
Prospective customers who return the postcard so that their scaled marks would have a standard removed. e smg e outlier is
(MEI)
16. The cumulat'Jve 1requency t bl b l
the lengths, in minutes of 4a00e ow refers tot
are then· contacted by one of five sales staff: deviation of 9 and a mean mark equal to the made from a certain h~useholdte e~hone calls
Gideon, Magnus, }emma, Pandora or Muruvet. mean mark of all 50 candidates. 14. Data of three months. dunng a period
The pie charts below represent the number of {a) What effect would this have on an original 4320 collected
houses inf:o;n a survey of the cost of
sales completed and the number of potential table below. own arc summarised in the
mark of 60 obtained by a candidate from
customers contacted during a one-month period. Length of call
Set A? Number of houses in minutes Number of calls
(b) Given that the original marks of the Cost(£)
Sales completed candidates in Set A were all integers, explain
Pandora why no mark would remain unchanged. 20 001 50 000 540 <;1 20
(C Additional)
50 001-60 000 1150 <;2 67
Magnus 60 001-70 000 1320 ~2} 118
12. Machine A is set to cut lengths of wood 100 mm
70 001-100 000 860 <;3 177
long. To test the accuracy of the machine, a
random sample is taken from the output. The 100 001-150 000 450 <;5 315
sample size is denoted by n and the length in <; 10 400
mtllimetres of each piece of wood is denoted On graph paper, tllustrate th d
Gideon cumulative frequency h e ata by means of a
by x. The results arc summarised by to estimate the d. grap ' and use your graph Construct the correspondin f
me mn cost an d t h e mterquartile
· ' draw a histogram to ·n g requency table and
n ~50, :Ex~ 5035, ~ 507 033. range.
2
Potential customers contacted :Ex Use linear inter ola _t ustrate_the data.
Calculate the mean and standard deviation of the length of call a~d e tt~n_to ~sttmate the median
Muruvet
lengths in the sample, giving your answers
15. The age d'_IStn·bution of the a l' . (C) significance of a v ~p a m .t e geometrical
recorded m the table below. pp Jcants for a JOb is . erttca1 1me dra h
correct to one decimal place. tstogram at this val ue. < wn t rough the
(C Additional)
h
Machine B is also set to cut lengths of wood Age (years) 20- 35 40 45 50 60
100 rom long. A random sample of 50 items
from this machine has mean 100.2 mm and Number of
standard deviation 1.1 mm. Giving your reasons, 8 9 0
applicants 14 12 7
comment briefly on the accuracy of the two
Pandora
machines. l C)
17. ~he . fr e ep lone calls from my hous e during the
13. A frequency diagram for a set of data is shown in ftrst figure showsofa last
six months cumulative
year equency curve for the length oft I 1
the figure. No scale is given on the frequency
Gideon axis, but summary statistics are given for the
distribution:
160 ill !
11 :Lj LllJ :;t
1 il H Iii
fl li' nu_tt 'I
TiT
i '

~n
Magnus
' 1'I
,.
(a) The total number of potential customers 20
::I
I i
I'
contacted is 1100. Find, approximately, the
total number of sales completed. :, IT i i I

(b) Describe the main features of the data I! T y


j

revealed by the pie charts.


(c) The manager wishes to compare the sales 80
; T
staff according to the number of sales
completed. What type of diagram would i !/ i
j I T'
you recommend, in place of a pie chart, so
that this comparison could be made easily? ! !
lA
(AEB)
I5 6 7
I

8 '
40

I;
rr~~~
11. An examination is taken by two sets of 2 3 4 !T:
1
candidates from the same school. The number of (a) State the mode and the mid-range value of IT
~~ ~~1-i· 1
candidates in each set, the mean marks and the i
?i'
variances are shown below.
the data.
(b) Identify two features of the distribution.
(c) Calculate the mean and standard deviation
0
0
_...I.T
'T iT
10
!i i
1
15
;
20 25 30
5
Number of Variance of the data and explain why the value 8, Time in minutes
Mean mark which occurs just once, may be regarded as
candidates
66 9 an outlier.
SetA 20 39
51
Set B 30
I i' 115

(c) for
Dataa sample
on the number (a) State the type of dia ram a .
{A) the distribution of these call times is of OO of wor d s per sentence illustrating the data.g ppropnate for
second m . 1 sentences taken from a
negatively skewed, . 'l agazme were presented in a t bl (b) Ca!culations using the data in the tabl .
(a) Find the median and inter-quartile range. (B) the majority of the calls last longer than stml ar to Table 2 and . h h a e esttmates as follows e g1ve
(b) Construct a histogram with six equal intervals. From this ne:tt bi c same _class
6 minutes, for the mean numb cr o f wor ta de,s an mean time of the experiments
intervals to illustrate the data. (C) the majority of the calls last between peresttmate standard deviation 69.64
6.37 s.s,
(c) Use the frequency distribution associated 5 and 10 minutes, was calculated to be 9 14 sentence
with your histogram to estimate the mean (D) the majority of the calls are shorter than sentence in this sa 1. ?hin fact, the only E~xp lainvalues.
precise why thes e are estimates
· rather than
the mean length. (MEl) words was one w:pl e wtt more than 25
length of call.
(d) State whether each of the following is true
or false
C I 1 JC 1 was 32 words 1
rna cu afteha_n improved estimate for
ean o t ts second sample .
th~ng.
(C)
range of the times t
experiments.
tn
(c) Estimate the median d h .
ft e mterquartile
a <en or completing the

5. The following table shows d . (d) It was subsequently revealed that th


taken, in seconds to th ata about the time expenments m the 5 0-60 l e four
com leti e nearest second for taken 57 59 59 d 60 c ass had actually
P. ng each one of a series of 75 ? "1 ' ' an seconds
Mixed test lA Two new people joined the orchestra and the
number of hours of individual practice they did
ch emtcal experiments. · stmJ ar respecttvely. State, without further
1. One hundred runners competed in a in the first week of June were p - 2a and p + 2a. Number of
b~1~~~~~~~:!:~c~f~efctth(tfe me
andy! there would
tan
half-marathon race. The table below shows N, (c) State, giving your reasons, whether the effect
the number of runners who completed the course Time (s) experiments mterquartile range and mean If thts
of including these two members was to (C)
mformatJOn were taken mto account.
within T minutes of the start. increase, decrease or leave unchanged the 50 60 4
155 mean and standard deviation. (L) 61-65 13
105 115
65 85 95 66-70 26
T 100
53 73 90 4. A newsagent carried out a survey to gather 71-75 22
0 25 general information about her customers, and 76-86 10
N
the readability of her magazines. Table 1 shows a
Construct the corresponding frequency table and

pr~~gt·hdof ~thranhdom
classification of the customers during one hour
6. As part of aand
employees detailed theof1its wor k f?rce, a large company select l
study
recorded
use of trading and Table 2 shows the number of sample· Th
of e100
h'tstogram
(a) this
to draw a histogram to represent the data, words per sentence for a sample of 100 sentences
tllustrates the distribution uce . ttme each employee had beenecWI t e company male
(b) to estimate the mean value ofT. (C)
taken from a magazine.
2. The following stem and leaf diagram summarises
the blood glucose level, in mmol/1, of a patient, Table 1
Time employed by
J; ;:LL~LJ_c'-'=i';
. the comp an y for. a random sample of 100 f. 1
o Its rna e employees
measured daily over a period of time. Adult Adult
Totals Child/
Blood glucose level 5 \ 0 means 5.0 female male
(12) Student
5 0 0 1 1 1 2 2 3 3 3 4 4 ( 9)
5 5 5 6 6 7 8 8 9 9 (10) Number of 22
6 0 1 1 1 2 3 4 4 4 4 ( ) 5 28
customers
6 5 5 6 7 8 9 9 ( )
7 1 1 2 2 2 3 ( )
7 5 7 9 9 ( ) Table 2
8 1 1 1 2 2 3 34 ( 3)
8 7 9 9 ( 4)
9 0 1 1 2 ( 3) Words per
1-5 6-10 11-15 16-25 26-45
9 5 7 9 sentence
(a) Write down the numbers required to
complete the stem and leaf diagram. Number of 14 14
18 32 22
(b) Find the median and quartiles of these data. sentences
(c) On graph paper, construct a box plot to
represent these data. Show your scale (a) State a suitable type of diagram which could 30
Time (yearn)
represent the data in Table 1.
clearly. (b) The survey was carried out on a Monday
(d) Comment on the skewness of the (L)
morning. Give one possible reason why Copy and complete the f 0 II owmg
· table. 8.6 ~ears for the quartile times Th 1
distribution.
conclusions based upon the results of
T' (a) servmg woman in the
the compan for
l h e ongest
samp e ad been with
included a ;oma y~ah. Jhe sample also
3. The 30 members of the Darton town orchestra Table 1 should be treated with caution. I me (years) 0- 2 5 10- 15 20 30 20
joined the compa~; Do a dv~ry recently
each recorded the amount of individual practice, (c) Represent the data in Table 2 by means of
x hours, they did in the first week of June. The an accurately drawn histogram on graph Number of males 3S plots to · raw a Jacent box
results are summarised as follows: paper.
in the sample length c~~pare the distributions of the
1:x ~ 225, 1:x 2 ~ 1755.
(d) Use the figures in Table 2 to calculate,
correct to three significant figures, estimates (b) Calculate estimates f h . emplo;;es t~~~ be~~ :7tbl~hyees and female
The mean and standard deviation of the number (c) quartile times f ho t e median and (d) List . re~ d'ff
. th e company
An equivalent r~r t e sample.
of the mean and standard deviation of the I erences between the two .

employees gave c:~~m sampl~ of 100 female


of hours of practice undertaken by the members number of words per sentence in the sample. dtstnbutwns as illustrated by th e b oxplots.
(NEAE)
of the Darton orchestra in this week were ?t and
3.2 years fa . h u~ated esttmates of
1 t emed~an ' an d 1 · 8 years and
a respectively.
(a) Find!'·
(b) Find a.
116 ,n_ CONCISl~ COURSE: IN ,:\-U~Vl]_ STi\r1STiCS

6.

g 5 00
lfiJ ;x Hl .
1

i!ll ''[J[IHI'JiiTI'IU' 'ITf' !IV


L 1 H lr f[ I! Iii I 11, r
Mixed test lB IT [T
(b) Calculate, to the nearest degree, the angle of
t ~! I
1. In a transport survey, the number of passengers
in each of 523 cars travelling into a town centre
the corresponding sector in the 1990 pie
chart. (C)
E
8
0 ll ITI' i: ·I' li 1: 'r : H II
il
;;
~' 4 00
on a particular morning was recorded. The 1:
results are summarised in the following table. 4. A school cleaner is approaching pensionable age.
She lives halfway between two post offices, A
and B, and has to decide from which of the two
z
UH p I'
I1
I I; :;
[,II'
:j, I' I; i+:, lfl i! I ii
Number of passengers
3 4 5 she will arrange to collect her pension. For a few
ll ~~~ u Iii
0 1 2
in a car
183 160 108 63 8 1
months she has deliberately used the two post
offices alternately when she has required postal
30 o H 'F :, h 1: t !\I' ir li I' H-H
Number of cars
(a) Calculate the mean number of passengers in
services. On each of these visits she has recorded
the time taken between entering the post office
II I i I, I:li 1
'
[1
t
i!
I, h
a car, giving your answer correct to three
significant figures.
and being served.
The boxplots below show these waiting times for 20 0 I' H
(b) State the mode of the number of people (i.e. the two post offices. The symbol * represents an f:
passengers plus driver) in a car in the survey.
Htu H li
(c) It is given that, correct to three significant outlier.
figures, the standard deviation of the PostOificeA ~ 100 'I I u fl : :i l! ~~ IT
number of passengers in a car in the survey
is 1.09. State the standard deviation of the
number of people in a car in the survey. (C)
Post Office B -------cTI----
10 20 Time (minutes) : 11ir , I, I: II i! )j I ; H rI 11
1l H!:
· !, lr lrli !.
I
1

0
2. A student collected some data on the heights,
0
I!!:
x em, of plants of a particular species. She chose (a) Compare, in words, the distributions of the '
0 1 2 3 4 5 6 7
to represent the data in a stem and leaf display, waiting times in the two post offices.
Time in hours
as shown below. (b) Advise the cleaner which post office to use if

I Unit is 1 em the outliers were due to The diagram shows a cumulative frequenc (b) Cobpl Y and complete the following frequency
9 polygon for the numbers of competitors who
2 2 3 4 4 4 5 6 7 7 (i) a cable laying company having severed
completed a marathon within 2 2! 3 4 d 7
ta e.
12 1 1 1 2 5 5 7 the electricity supply to the post office,
hours of the start. ' 2• ' an
3 1 2 2 5 5 9 (ii) the post office being short~staffed. Time in hours 2-2} 2}-3 3 4 4 7
(AEB) (a) "l!se the diagram to estimate
4 1 3 4 5 (1) the median No. of competitors 200
(a) (i) Explain why the data might be better 5. Thirty children were given a task to perform and (ii) the quartil;s
represented by a two-part stem and leaf
the times taken were recorded, each to the next of the times takei; by the 500 competitors (c) Calculate an estimate of
display. who completed the run.
whole number of minutes above the actual time. (i) the mean
(ii) Rewrite the above data in such a
display. The results were as follows: (ii) the stand~rd deviation
(b) Calculate an estimate of the mean height, 12 20 14 17 17 8 19 13 27 13 of the 500 competitors' tim'es. (NEAB)
22 16 11 18 13 6
in centimetres, of plants of this species. 16 18 10 7
8 10 17 16 19
(c) Calculate the median of the data given in the 16 12 14 23 15
display. (a)
Copy and complete the following stem and
(d) State which of the mean and median would leaf diagram to illustrate the above data.
be a better measure of location for the
heights of these 29 plants. Give a reason for Key 10 \ 5 represents a time of 15 minutes
your answer. (0)
7 8 8
3. A pie chart was drawn, for each of the years 0 1 2 2

l! Il
1990 and 1995, to illustrate the amounts spent 6 6
by a householder on electricity, gas, water and 2 3
telephone, and to compare the total amounts
spent in the two years.
(a) Given that the radii of the 1990 and 1995 (b) Usc your diagram to estimate the median
charts were 15 em and 18 em respectively, and the quartiles of the distribution of times
calculate the percentage increase in the total taken to complete the task.
amount spent. (c) Draw a box plot to illustrate the
distribution. (NEAB)
The amount spent on water in 1995 was twice
the amount spent in 1990. In the 1995 chart the
amount spent on water was represented by an
angle of 4 7o.
For example, if you place weights of 10 g, 20 g, 25 g, 30 g, 35 g, 50 g, etc on the end of a
spring and record the length of the spring for each weight, the weight is controlled so it is the
independent variable. The length of the spring is the dependent variable.

REGRESSION FUNCTION
Having drawn a scatter diagram, you can then look for a mathematical relationship between
the variables, y = f(x), where the function f, known as the regression function, is to be
determined.

Regression and correlation LINEAR CORRELATION AND REGRESSION LINES


Consider the simplest type of regression function, where y = f(x) is a straight line.
In this chapter you will/earn how to If the points on the scatter diagram appear to lie near a straight line, called a regression line,
you would say that there is linear correlation between x and y. Here are some examples:
interpret scatter diagrams for bivariate data
Positive linear correlation Negative linear correlation No correlation
calculate the equations of the least squares regression lines and use them to estimate values
y y y
. t
calculate an d m erpre
t the value of the product-moment correlation coefficient ..
. t t the value of Spearman's rank correlation coefficient .:{/::· ·)~\·:._/A regression line
calculate an d m erpre ;,/.!'
.:;~ •':!~ .
•• :/.."• A
• '/o•
,.
.y..
regression line
·:"\ . ...
SCATTER DIAGRAMS
Suppose you wish to investigate the relationship between two variables x andy, for example
y tends to increase y tends to decrease No relationship

the weight at the end of a spri~g(x) ;nda~:;:::!'a~i~~h~x~p:~d\~~:Uark


achieved (y)
as x increases as x increases between x and y

the number of hours spent stu ymg or l . G test ( y) Common sense and care are needed when interpreting scatter diagrams.
. F h t() andthemarcma erman '
a student's markthrn a rene£ tels ;(x) and the average length of leaf of the plant (y).
- the diameter of e stem o a p an " Mathematically, there may appear to be a relationship, but this does not imply that there
. . bl known as bivariate data. When pairs of values are plotted, is a relationship in reality. You might find, for example, that over a period of time in a
Data connectmg two vana es are particular city there has been an increase in the number of robberies and an increase in
a scatter diagram is produced. Here are some examples: the number of health food shops. It would however be foolish to iniply that there is a
y
y relationship between these two variables.
X
X " The appearance of a mathematical relationship does not imply that there is a causal
X
X X X X
relationship. An increase in one variable does not necessarily cause an increase, or decrease,
X X X X X X X in the other variable.
X X X
X X X X
X
If it appears from the scatter diagram that a linear relationship is a sensible interpretation, you
X X may then attempt to find a model for the relationship in the form
A line of 'best fit'
of a regression line. drawn 'by eye'

In previous work you may have drawn a line of best fit on the
scatter diagram, attempting to draw it so that there are as many
Dependent and independent variables points above the line as below it, or as many points to the left of
the line as to the right of it. The line should also go through the
. bl h b t
!led it is called the independent or explanatory variable.
If one of the vana es as een con ro ' . point (x, ji), the means of the two sets of data.
The other variable is then the dependent or response vanable.
. ' , . . ther ha hazard. There is a mathematical way of Note that the times, x, are chosen by the person holding the stopwatch, sox is the
This method, known as drawmg by ~e , 1S ~ad f 1 p t squares and this is illustrated m the independent variable. The values of the mass, y, depend on the results of the chemical process
fitting the regression line, known as t e met o o eas at these times, therefore y is the dependent variable. If you were to repeat the experiment with
following example. . . the same values of x, you would almost certainly get a different set of values of y. So for a
. . of a chemical is related to the ttme, x mmutes, fixed value of x you could have several different values of y, all in the same vertical line on the
Consider the situation m whlcb thhe mbass, y ~? lace according to the table:
for which the chemical reaction as een ta mg p ' scatter diagram.

5 7 12 16 20
Time,xmin

Mass, y g 4 12 18 21 24 least squares regression line of y on x


These results can be illustrated on a scatter diagram. To find the equation of the least squares regression line of y on x for the chemical experiment
data, consider vertical distances m 1 , m 2 , m 3 , m 4 , m 5 drawn from each point to the regression
y line. These distances will be positive or negative according to whether the points are above or
25 X
From the scatter diagram there seems to be a .Positive below the line, so instead work with the squares of these values and consider their sum,
20
X
linear correlation between the mass and the nme. m11 + m 22 + m/ + m/ + m/. A shorthand way of writing this is Lm/, where
X
0)
The line of best fit must pass through the means of both fori= 1, 2, 3, 4, 5
15 data i.e. the point (x, ji). You should fmd, by
X \ (X,Y) sets o f , . ( 15 8) It h been
10 calculation, that this is the pomt 12' . . as A line that fits the data well is one that makes "Lm," as small as possible, i.e. it is drawn so
plotted on the diagram. that Lm; 2 is minimised.
5 X
This line is called the least sqnares regression line of y on x.
0 X
0 5 10 15 20
Consider our three attempts at drawing the line of best fit. The vertical distances have been
Diagrams 1 and 2 show attempts at drawing the line of best fit. shown and you can see that 'Emf is least in diagram 3.
I
y
'
y

25

20
X f
X

f
X

~
~
~
~
~
25

20 xl
c;~
I
jX
'
J X

25

20
y

m3
m4
m,
L.,.
l,,,,.""
~
25

20
y

m, ' m,
(It
,/L 25

20
y

/' '
#/~~· '"'

i
15
15 15 15
~ X J 15
" m,
'
'' ~rm,
10
" """
10

5 X
5

0
5
I

10 15 20
10

5 m1
10

5 m1
)'
V 10

5 I

0 0 0 0 0
0 5 10 15 20 0 5 10 15 20
X
0 5 10 15 20 0 5 10 15 20
Diagram 2
Diagram 1 Diagram 1 Diagram 2 Diagram 3
. h h (- -) and there are three points above the
In each of the attempts, the dotted lme goes t roug x: y
line and two points below it. yet neither of these hnes ts correct.
Useful formulae when calculating regression lines
Diagram 3 shows the true line of best fit. y
It has equation 25 Before looking at how to find the equation of the regression line, here is a reminder of the
y= 1.15 + 1.22x 20 formulae for the mean (page 28) and the variance (page 37) of a set of data together with a
new formula that connects the x andy data, the covariance.
This equation has been calculated by using the method 15
of least squares and the calculations are shown on y = 1.15 + 1.22x For the x data:
10 True line of best fit
page 123. LX
5
The mean of the x data is x where X = - .
n
0 -1-....,----,--...,.-r-~
0 5 10 15 20 The variance is usually written s 2 , but to distinguish that it is the variance of the x data, you
Diagram 3 could writes}. Usually, however, when working in the context of regression and correlation,
the variance of the x data is written sxx·
To
Thisdefine the regression line, you need to find the value of a and b for a particular set of data.
IS done as follows:
Remember that there are alternative formats of the variance:
For the regression line y on x written in the form
y=a + bx
the gradient, b, can be calculated as follows:
For the y data:

- ~y b or
y=-
n 1 ~y'
or s =- ~y 2 - ji 2 = - - - ji
2
1 -2 ~(y-ji)' YY n n
Note that b is known as the regression coefficient of yon
s,,=-~(y-y)
n n To find a, use the fact that (x, ji) lies on the line, x.
For the x and y data: If Y =a+ bx then ji =a+ bx
The covariance, s xy' connects the x and y data and the formula is
1 ._, __ ~xy __ Rearranging a 5'' /ric
orsx =-.cxy-xy=---xy
' n n To
pagefind the equation of the regression line y on x for the chemical
120: . experiment data on
In some textbooks and formulae booklets you might see the notation Sxx' and Sxy· These s,,
are known as the 'big S' formulae and are derived from the 'small s' formulae above as
X y x' y' xy
follows:
(~x)' 5 4 25 16 20
orS =~x 1 -nx 2 =~x'---
xx n 7 12 49 144 84
(~y)' 12 18 144 324 216
orS =~y'-nji 2 =~y ---
1

YY n 16 21 256 441 336


20 24 400 576 480
Lx 60 Ly 79 Lx 2 874 Ly' 1501 Lxy 1136

The big S formulae are useful in calculations where the factor of n cancels, but it should be There are five pairs of data, son 5.
remembered that they are not the formulae for the variance and covariance.
__ ~X 60 . ~ 79
X- -;;=S = 12 and ji = : =S = 15.8
The equation of the regression line y on x y
gradient m
1 1
Sxr=;:; ~xy- Xji =S X 1136-12 X 15.8 = 37.6
You are probably familiar with the equation of a straight line in 1 1
Sxx=;:; ~X 2 - X2 =5 X 874-122 =30.8
the form
y=mX+C
0
For the regression line y on x in the form y =a + bx:
where m is the gradient and c is the y-intercept.
b= sxy = 37.6-
When writing the equation of the regression line, a slightly gradient b sxx 30.8- 1.2207 ... = 1.22 (2 d.p.)
different format is usually used in which the constant term is
written before the x-term and the letters used are a and b. and a= Y- bx-
S - 15 . 8 - 1.2207 X 12 = 1.150 ... = 1.15 (2 d.p.)
The format is o the equation of the regression line yon xis y = 1.15 + 1.22x.

y=a+ bx
where b is the gradient and a is the y-intercept.
Making predictions using the regression line y on x
If you use the big S formulae: The regression line y on x gives you the average value of y for a given value of x, so in certain
60x 79 circumstances it can be used to predict or estimate missing values. This is known as
:Ex:Ey -1136 %188
Sxy %:Exy---- 5 interpolating from the given information.
n
2 60 2 The regression line y on x is used
(:Ex) -874--%154
2
sxx%:Ex - - n - 5 o when xis the independent variable and you want to estimate y for a given value of x, or
you want to estimate x for a given value of y.
b- Sxy% 188%1.2207 ... %1.22 (2 d.p.)
- sxx 154 d ) e when neither variable is controlled and you want to estimate y for a given value of x.
d b e to give a% 1.15 (2 .p .. For the chemical reaction data, in which x is the independent variable, you can use the
and a is calculate as a ov . . h following format for the
An alternative way of working out the equatton ts to use t e regression line y ~ 1.15 + 1.22x to estimate (a) y when x% 10, (b) x when y% 20, as follows:

equation of a straight line: . d . (h k) the equation of the line can (a) The estimate of y when x% 10, written j), is given by
If m is the gradient and the line goes through a hxe pomt ' ' Y% 1.15 + 1.22 X 10 ~ 13.35
be written (b) The estimate for x when y ~ 20, written x, is given by
y- k% m(x- h). d' t. band the fixed point is (x, y), so the 20% 1.1s + 1.22x
. the gra ten lS
\' X
In the case of the regre:slO~ me y on ' be written 1.22x ~ 18.85
equation of the regresswn lme y on x can
x% 15.4
orb ,c-::.
Warning: you must take care, though, as estimating outside the range of your data is
y 5' -~~ b(x --X) Sxx .
· t the equation ts unreliable. For example, for the chemical reaction data, when the reactants have formed their
For the above data relating to the chemical expenmen ' product, the reaction ceases and the mass would not continue to increase. Going outside the
range of data is known as extrapolating from the given information.
y -15.8% 1.2207(x -12)
Important note: In the situation where neither variable is controlled and you want to estimate
y -15.8% 1.2207x -14.648
x for a given value of y, you would use a different regression line, the least squares line x on y.
y% 1.2207x + 1.152 You would also use the regression line x on y if y is the independent variable. This is described
y% 1.22 x + 1.15 (2 d.p.) as before more fully on page 130.

Summarising:
The least squares
cc><n'f•.ssion line Y on X is Using a calculator to find the regression line y on x
y ~,_-a+ /;x Linear regression (LR) mode on the calculator enables you to input the pairs of data (x,, y;)
and then obtain the values of a and band also x, y, I:x, I:x 2 , :Ey, I:y 2 , :Exy and n. On the
calculator, the value of a is usually denoted by A and the value of b by B.

'V -- y -~~ - :t) rcc,:rccssi.on cocfi'icicnl oJ y on .x.


"t;1Cl!Cm •yl~ the line and 1s kno\vn as the
b is the"" '
a is the y-intcn:cp•t b d t alculate the regression line. It is
d 'bed above can e use o c d.
f
Note that any of the ormats escn f iliar with the one use m your
a good idea, however, to make sur~ that yo~ea~:ea:; these, or one written in a slightly
. t'ton formulae booklet, whtch may
examtna
different form.
(a) Draw a scatter diagram to represent this information
Your calculator may follow a similar procedure to that outlined below. If not, you should
The equation of the regression line of p on t for Norman:s data is
consult your calculator manual.
p ~ 122.3-11.01.
Casio 85W/85WA/570W
(b) Use the above equation to estimate Norman's pulse rate 2 5 . f .
\MODE\\ilffi exercise programme. · mmutes a ter stoppmg the
Set LRmode or \MODE\ 1=\M~O=D~E\il] [I]
Reg's pulse rate 2.5 minutes after after stopping the exercise was 100.
Clear memories
The full data for Reg are summarised by the following statistics:
Input data [IJ[JIIJIDT\
[I] [J [1] IDTI n~S, Lt~ 19.5, ~~'~63.75, LP~ 829, LPt~1867

[1] [J [liD \DT\ (c) Find the equation of the regression line of p on t for Reg's data.
lrn [J ITIJIDTI (d) State, giving a reason, which of Reg or Norman you consider to be the fitter. (L)
IIill [J ~ IDTI
Equation of regression line:
You now have access to Solution 2.1
A~ 1.1506 .. .
ISHIFT Iill El y~A+Bx
(a) Scatter diagram to show Norman's data
ISHIFT I[]] El soy~ 1.15 + 1.22x
B ~ 1.2207 .. .
You can check the following
!·! iii ll JUU
Lx 2 ~ 874 \RCL\ III Red knn_-s i\, B, C, D, F :\!ld f
Lx~60
\RCL\ (ID 011_ thi1·ci ro\·V of t.::\knhtor.
\RCL\ [9 t:::
n~5 !•
Lf~ 1501
\RCL\ [l2j
Ly~79
\RCL\ ffiJ
Lxy~1136
\RCL\ [£}
\SHIFT\ I]] EJ :o
x~12
\SHIFT\il] EJ ~ EJ
Sxx = 30.8
\SHIFT\IIJ EJ
ji ~ 15.8 iF:
Syy ~ 50.56
~mEJ~EJ
To clear LR mode
\MODE\ I]]

To estimate y when x ~ 10, key in


[QI \SHIFT I liJ to give 13.35 ... ('

To estimate x when y ~ 20, key in


~ \SHIFT I ~ to give 15.44 ...

Example 2.1
..•.(b) P••·'""·122.3I! -11.0t
'
~
"···· ,,,. "'' .. '"' ''" '"·'"

One measure of personal fitness is the time taken for an individual's pulse rate to return to so when t ~ 2.5, P ~ 122.3- 11.0 x 2.5 ~ 94.8
normal after strenuous exercise; the greater the fitness, the shorter the time. Reg and Norman
(c) Regression line of pont for Reg's data is p ~a+ bt
have the same normal pulse rates. Following a short programme of strenuous exercise they
both recorded their pulse rates P at time t minutes after they had stopped exercising. where a~ p- bt and b ~ s,,, ~ s,p
Norman's results are given in the table below. Su Su
4.0 5.0 - LP 829 ~ t
- 19.5
1.0 1.5 2.0 3.0 r~-~ -~103 625 t~--;;~ -6 ~2.4375
t 0.5 n 8 · '
102 94 81 83 71
p 125 113
To plot the(xline
including on the scatter
y·) th f diagram
h f, ydou nee d to wore
1 out three points on the line
To find busing the smalls format: , , e mean o eac set o ata. '
1 l.tP- iP
s , =- - =-
1 x 1867- 103.625 x 2.4375 = -19.210 ... From the calculator, x = 22.675 andy= 16 · 75 ' 80 P10 t (-x, Y
-) as accurately as you can.
11
n 8 Now choose two other x-coordinates and 1 1 h 1
1 1 should be within the range of data perhapcsa ctuthate t e y va ue for each. The x-coordinates
2
su=-l.t 2 -i 2 =-x63.75 -2.4375 = 2.027 ... , a e extremities.
n 8 Choosing x = 21.8 and x = 24.2:
-19.210
- - - = -9.4759 ... When X= 21.8, y = 11.47 + 0.2327 x 21.8 = 16.54 ... , so plot (21.8, 16.54).
2.027
To obtain this directly on the calculator key in
To find b using the big S format: 121.81 !SHIFT! [3)] to give 16.546 ...
l.tl.P 19.5x829
S,p= L.tP - - - = 1867 -153.6875 When x = 24.2, y = 11.47 + 0.2327 x 24.2 = 17.10 ... , so plot (24.2, 17.1).
n 8
(I.t) 2 19 5
2 Directly on the calculator: 124.21 !SHIFT! [I] gives 17.1048 ...
S,=I.t 2 - - - = 63.75 - - ·-= 16.21875
n 8 Now drfawh thde regression line, joining the three points, but do not take the line beyond the
range o t e ata.
b = S,p = -153.6875 -9.4759 ...
s" 16.21875 Scatter diagram to show the lengths (x) and breadths (y) of 12 cuckoo eggs.
To calculate a, use .
-
II "
a= P- bi= 103.625- (-9.4759) x 2.4375 = 126.72 ... .
i'
Regression line Pont for Reg is P = 126.7- 9.5t.
(d) Norman is fitter as his pulse rate decreases more rapidly. This can be seen from the
r:
gradients of the regression lines: the gradient for Norman is -11.0 and the gradient for ji .·-·
-

I!
Drawing a regression line on a scatter diagram
i!
Example 2.2 - ': -

The following data represent the lengths (x) and breadths (y) of 12 cuckoos' eggs measured in
millimetres.
X 22.3 23.6 24.2 22.6 22.3 22.3 22,1 23.3 22.2 22.2 21.8 23.2 :

y
16.5 17.1 17.3 17.0 16.8 16.4 17.2 16.8 16.7 16.2 16.6 16.4
H ! !fiT if h
Draw a scatter diagram for the data. 'n'' i.
!
• '. i I
Obtain the least squares regression line of y on X and plot this on the scatter diagram.
i
!:
(NEAB) I
I
i!: !1. il
Solution 2.2 ,· I: i:I,Pf
! I -
~~
I I I I I:
! !l
~I
--1 f ii
i [ [ [ L_i
-~11~
i i i
1r
i!:; I
i4q.p: ~~,.g:II
Ill 1 I I I
! ! II 1 f 1 j j
i~l~i inp 1~ P' ~~4?1
l l i I : I_ l [ 1 i_l [_1_[_1 i_~ --:~
The scatter diagram is shown below together with the regression line.
To find the equation of the regression line use the formulae or find it directly on the calculator
where you should find that A= 11.473 122 ... and B = 0.232.717 9 .... Giving values to four
significant figures, the equation of the least squares regression line of yon x is

y = 11.47 + 0.2327x
Considering the data of Example 2 ·2 , the summ ary m
. formatiOn
. Is
.

least squares regression line x on y Lx ~. 272.1, Lx' ~ 6175.69, Ly ~ 201, Ly2 ~ 3368.08, Lxy ~ 4559.04 and n ~ 12.

In Example 2.2, the regression line yon x would be used to estimate the breadth of a cuckoo's Jo~~~:~: the equation of the regression line x on y in the form x ~ cy + d calculate c and d as
egg, y, for a given value of tbe width, x. Note that neither the length nor the breadth of the
cuckoo's egg is controlled, so there is no independent variable. If you wanted to estimate the . Lx 272.1
width, x, for a given value of the breadth, y, you would use a different line, the regression line Fmd x~-;:;~12~ 22.675

x on y.
To calculated using the small 5 format:
The least squares regression line x on y is used
1 1
" when neither variable is controlled and you want to estimate x for a given value of y. 5 xy~;; Lxy- xy ~12 X 4559.03-22.675 X 16.75 ~ 0.11291 ...
" when y is the controlled (independent) variable and you want to estimate x for a given
- 1
s,,-;;Ly 2
-y-2 ~ 1 x3368.08-16.75 2 ~0.11083 ...
value of y, or y for a given value of x. Least squares regression
linexony 12
This time the horizontal distances n 1, n 2 , n 3 ••• from the d ~ s,, 0.11291 .. .
points to the line are considered. s,, 0.11083 ... 1.0187 ...
The sum of their squares, To calculate d using the big s format:
""
~ni 2 = n12 + n22 + n32 + ... n, Sx,~Lxy- LxLy ~4559.03- 272.1x201 1.355
is made as small as possible, i.e. the line is drawn so that n, n 12
I. n/ is a minimum. 2 (Ly) 2 201 2
s,~z:y ---~3368.08- --~133
n 12 ·
The equation of the regression line x on y d- sxy- 1.355
-s,, -1.33~ 1.0187 ...
The equation of the regression line x on y is often written in the form
To calculate c, use c ~ x- dy ~ 22.675- 1.0187 x 16.75 ~ 5.6101 ...
x -_,, c+ The equation of regression line x on y is x ~ 5.61 + l.02y.
. 1'me y on x.
It ts mterestmg to plot this on the scatter diagram ' together with the r egressiOn
and d
y-coordinates say y ~ 16.4 andy~ 17 0 d, l PI o h . 5, 16.75). Now choose two other
You know that the line must go through (x y) so l t (22 67
See page 122 for the formulae for sxy> s,,, Sxy and S,. ' · an ca cu ate t e value of x.

Also, since the line goes through (x, y), the equation can be written When y ~ 16.4. x ~ 5.61 + 1.02 x 16.4 ~ 22.3
When Y ~ 17.0, x ~ 5.51 + 1.02 x 17.0 ~ 22.95
X - X~ d()'- ji)
dis known as the regression coefficient of x on y. Plot (22.3, 16.4) and (22.95, 17.0) and join the three points with a straight line.
Note, however, that dis not the gradient of the line. This can be seen by rearranging the
equation x ~ c + dy gradient ~
y o=

dy~x- c

y~(~)x-J.
1
So the gradient of the regression line x on y is d and the
. . c
y-mtercept ts -d.
r
_;+

n iiJtl irlm ! r The method is illustrated below usin th d

1
2.2 on page 128. ' g e ata for the cuckoos' eggs given in Example

iI i1 cf
!i li T i SetLRmode
Casio 85W/85WA/570W
/MODE/IIJIIJ
Ji or /MODE//MODE/IIJIIJ

il ~[
Clear memories
l /SHIFT/[@ B
1ii'
H J t Input data
/16.5/ [J /22.3/ [DT/

if /17.1/ [J /23.6/ [DTJ

~

/17.3/[J /24.2//DT/
,1

fl /16.4/ [J /23.2//DTJ
I i! You now have access to
: A~ 5.610 ... (c) Equation of regression line
i /SHIFT/IIJB X~ C + dy
B ~ 1.0187 ... (d)
/SHIFT/[[] B ie x ~ 5.61 + 1.02y
You can check the following
1: Y2 ~ 3368.08
[1\J
'i !Jii 1:y~201
/RCLj
/RCLJ [!) Red third
'!1~~, ,:· Hi1
lt'ltUS Utl
t t
[?: 11 i I il p 'i n~s
/RCL/(9
l'tl\-V (_lf ~._-~LkuLnor.

1:x'~6175.69
Notice that the lines are not the same; in fact they are quite far apart. You will see later that /RCL/IQJ
this indicates that the correlation is not very strong (page 139). LX~ 272.1
/RCLj [[)
1:xy ~ 4559.03 Note that if your
/RCL/(I] calculator shows what is
ji ~ 16.75
Using a calculator to find the regression line x on y /SHIFT/II] B being found on the
syy~0.1108 ... display, you should read
The procedure, using linear regression mode, is similar to that described on page 126 for x~22.675
llli!ITJIIJB~B y- for x and x for y when
calculating the line y on x. This time, however, input the data with they-coordinate first. For /SHIFT/13] B checking these.
the equation x = c + dy, the value of cis given by A on the calculator, and the value of d by B. s"" ~ 0.4852 ...
llli!mliiB~B
To dear LR mode
/MODE/II]

Example 2.3

A student found the following data for the f 1 .


Domestic Production (GOP) per head $ . ema e hfe expectancy, x years, and the Gross
' y, m SIX countnes m South Asia in 1988.
Country X y
Afghanistan 42 143
Bangladesh 50 179
Bhutan 47 197
India 58 335
Pakistan 57 384
Sri Lanka 73 423
--::-2____::___
[n ~ 6, Lx ~ 3 2 7 :E - 16.~6-1 __.::_::::'_j
, y- , Ex = 18 415, Ey' = 529 909, Lxy = 96 412]
(b) From the equation, when y = 858
It is required to estimate the value of x for Nepal, where the value of y was 16s0fa. r as X= 31.2 + 0.0840 858 = 103 (3 S.f.)
(a) · bl 1· f · Simplify your auswer a X
(i) Find the equation of a smta e me o regressiOn. . . . ·
possible, giving the consta~ts correct .to thre~ stgmhcant ftgures. This would give the life expectancy in North Korea as 103 years, which is clearly not
(ii) Use your equation to obtam the reqmred estlmate. sensible. The value of y = 858 is a long way outside the range of the data, and should not
be used to estimate a value of x.
(b) Use your equation to estimate tbe value of x for North Korea, where the value of y
was 858. (C)
Comment on your answer.
Note on using the calculator in lR mode
Solution 2.3 .
(a) (i) Neither variable has been controlled inhthe g~ve~r~:~a;:!~:~~e/r~~~~; ;:;~::tsy You should check whether the regulations of your examination board permit you to use the
calculator in LR mode to find the equation of tbe regression lines without showing any
esttmate the hfe expectancy' x years, w en t e
is $160, it is sensible to use the regression line of x on y. supporting working. The equations are quick to find using the calculator, but a disadvantage
is that if you make a slip when entering the data, your answer will be wrong, and this would
The least squares regression line of x on y has equation result in the loss of all the marks. Supported by calculation, however, your answer, though
s
or d=2l
wrong, would receive marks for method.
x = c + dy where c = x - dy s, Sometimes data are presented in such a way that it is not possible to find the equations of the
regression lines directly using the LR mode. This is the case when, for example you do not
}:;x 327 }:;y 1661 2 2
know the raw data, but just the values of the summary statistics, :Ex, :Ex , Ly, Ly , Lxy and n.
x =--;:; = -6- and y =--;:; = -6- If data are presented just in this form, then the appropriate formula must be used and the
Using small s format to find d: values calculated.
1 -- 1 - 327 X 1661 = 981.25 Consider also when data are given as in the following two examples:
5 xy =-}:;xy-xy=-x%412
n 6 6 6
2
1
_.:_}:; 1
2_-2=-x529909- (1661)
- - =11681.47 ... Example 2.4
s,- 2 y y 6 6
For a given set of data it is known that x = 10 andy= 4. The gradient of the regression line y
5 xy 98 1. 25 0.08400 ... on xis 0.6.
d =- 11681.47
SY)' Find the equation of this regression line and estimate y when x = 12.
Using bigS format to find d:
327 X 1661 = 5887 . 5
}:x}:;y Solution 2.4
sxy =}:;xy---=96412
n 6 The equation of the regression line is y =a+ bx, where b = 0.6.
2
(}2y)2 1661
s =}2y2---= 529 909 --6-=70 088.83 ... y =a+ 0.6x
YY n
The regression line goes through (x, y), so y =a+ 0.6x
d Sxy 5887.5 0.08400 ... 4=a+0.6x 10
Syy 70 08 8. 83 a= -2
Calculate c using Equation of regression line is y = -2 + 0.6x
c=x-dy When x = 12, y = -2 + 0.6 x 12 = 5.2
327 1661
= 6- 0.084 00 ... X - -
6
=31.24 ...
Example 2.5
Equation of regression line of x on y is x = 31.2 + 0.0840 Y (3 s.f.).
Find the equation of the regression line of x on y if the line goes through (1, 4) and has
(ii) When y = 160, x = 31.2 + 0.0840 x 160 = 45 (2 s.f.) gradient 2.
The estimated value of the life expectancy in Nepal is 45 years.
136 /', CCNCiSl--- r..'(-11J',:v:;::
-c \_,
IN ,G.-i_E_\I[L. STi\TiSTiCS

The number (x) of alder trees and the ground


Student Christmas (x} Summer (y)
moisture content ( y) are found in each of
Solution 2.5 ten equal areas (which have been chosen to cover A 57 66
· 1·me x 0 n y is x = c + dy the range of x in all such areas). The following is B 35 51
Equation of regressiOn a summary of the results of the survey: c 56 63
dy =X-C Lx ~ 500, Ey ~ 300, Lx 2 ~ 27 818, D 57 34
Re-arranging
1 c Lxy ~ 16 837, Ly 2 ~ 10 462 E 66 47
y=dX-d Find the equation of the regression line of y on x. F 79 70
Estimate the ground moisture constant in an area G 81 84
1 equal to one of the chosen areas which contains H 84 84
Gradient=d 60 alder trees. (0 & C) I 52 53

1 4. To test the effect of a new drug twelve patients It is given that Lx = 567, Ly = 552, Lxy = 36 261,
2=- l:x 2 ~ 37 777, Ly 2 ~ 36 112.
d were examined before the drug was administered
d = 0.5 and given an initial score (I) depending on the {a) Find the equation of the estimated least
severity of various symptoms. After taking the squares regression line of Yon X.
X= c + 0.5y drug they were examined again and given a final {b) A tenth student obtained a mark of 70 in the
So score (F). A decrease in score represented an Christmas examination but was absent from
improvement. The scores for the twelve patients the summer examination. Estimate the mark
You are given that (1, 4) lies on the line are given in the table below. that this student would have obtained in the
1 = C + 0.5 X 4 summer examination. (C)
Score
c=-1 7. For a period of three years a company monitors
· 1· nyisx--1+0.5y. Patient Initial (I) Final (F) the number of units of output produced per
The equation of the regresston me x o -
quarter and the total cost of producing the units.
1 61 49
The table below shows their results.
2 23 12
3 8 3 Units of output Total cost
. f I st squares regression lines 4 14 4 (x) 1000's (y) £1000
Exercise 2a Equat1ons 0 ea . r Itisagoodideatobeabletousethe 5 42 28 14 35
1 I . he equations of the regressiOn mes.
Use the method you prefer for ca cu atmg \ at usin the formula. . . 6 34 27 29 50
calculator in LR mode and to be competen g 2 Th f Hawing data show, in convement umts~ 55 73
· d 0
· theeyield · l reac t"ton run at vanous 7 32 20
1. For each set of data, fm . 1" f x (y) of a chemtca
8 31 20 74 93
(a) the equation of the regress~ on ~ne f Yon '
0
different temperatures (x): 11 31
(b) the equation of the regre~slOn lme o x on y. 9 41 34
23 42
Plot them both on a scatter dtagrarn and Temperature (x) Yield (y) 10 25 15 47 65
comment. 11 20 16 69 86
110 2.1
Data set.l 12 50 40 18 38
120 4.3
22 23 26 36 54
9 11 14 14 15 21 3.1 Calculate the equation of the line of regression of
X 3 7 130 61 81
3.4 F onl.
23 16 10 20 25 140 79 96
y 5 12 5 12 10 17 2.9 On the average what improvement would you
150
5.5 expect for a patient whose initial score was 30? (Use l:x 2 ~ 28 740; Lxy ~ 38 286)
Data set 2 160
3.3 (MEl) (a) Draw a scatter diagram of these data.
170 (b) Calculate the equation of the regression line
X y
5. For a given set of data of y on x and draw this line on your scatter
(a) Plot the data. Comment. on w~ether it diagram
85 Lx ~ 15, Lx 2 ~55, Ly ~ 43, l:y 2 ~ 397,
1 appears that the ~sua! stmpl~ lmear The selling price of each unit of output is £1.60.
5 82 regression modelts appropnate. . Lxy=145,n=5
(c) Use your graph to estimate the level of
5 85 (b) Assuming that such a model is ~ppropnate, Find the equations of the regression lines y on x, output at which the total income and total
5 89 estimate the regression line of yteld on andxony. costs are equal.
6 78 temperature. . h and (d) Give a brief interpretation of this value.
66 (c) Plot your estimated hne on your grap ' 6. The following table shows the marks (x) (AEB)
7.5
indicate clearly on your gr~ph ~h.e ~~s~~ces, obtained in a Christmas examination and the
7.5 77
the sum of whos~ squares ts mmtmtse JEI) marks (y) obtained in the following summer 8. From a set of pairs of observations of the
7.5 81 the linear regresston procedure. ( examination by a group of nine students. variables x andy, it is found that the regression
10 70
line of yon x passes through the point (0, 1.8). If
11 74 3. In a certain heathland rehgion ~here i~~J~:ge the means of the x andy values are 5.0 and 8.3
12.5 65 number of alder trees w ere t e gro . d respectively, find the equation of the regression
69 marshy but very few where the ground ts ry. line of yon x in the form y =a+ bx. (L)
14
14.5 63
_,..--
I

.. I

The four recorded pairs of values are


THE PRODUCT-MOMENT CORRELATION COEFFICIENT, r
9. For a set of 20 pairs of observations of the 0.3
variables x andy, it is known that :Ex== 250, 0.1 0.2 0.4 !he product-moment correlation coefficient . .
:Ey = 140, and that the regression line of y and x
X mclusive which indicates the I" d f' r, lS a numencal value between -1 and 1
0.7 0.4 mear egree o scatter.
passes through (15, 10). Find the equation of the y 0.1 0.3
regression line of y on x and use it to estimate y ----1 <r <
Find the missing pair of values, using the
when x == 10. following data for the four pairs above: r = 1 i~dic.ates perfect positive linear correlation
10. The gradient of the regression line x on y is -0.2
and the line passes through (0, 3). If the equation
Lx ~ 1, Lx 2 ~ 0.3, Lxy ~ 0.47, Ly ~ 1.5.
(MEl) r = - ~ m~ICates perfect negative linear correlati~n
of the line is x = c + dy, find the value of c and d r = 0 mdtcates no correlation. ·
and sketch the line on a diagram. In an attempt to increase the yield (kg/h) of an The the value of r is to 1 or -1 ' the c Ioser the pomts
nearerline. .
13. regression on the scatter diagram are to the
industrial process a technician varies the
11. A small firm negotiates an annual pay rise with percentage of a certain additive used, while
each of its twelve employees. In an attempt to keeping all other conditions as constant as
simplify the process it is proposed that each Here are some examples of the value of r:
possible. The results are shown below.
employee should be given a score (x) based on
his/her level of responsibility. The annual salary • •• • • • •
Yield, y % additive, x
• • • •
(y) will be £(a+ bx) and the annual negotiations
• • • •
will only involve the values of a and b. The 2.5 • • • • • • • • •
127.6 • • • •
following table gives last year's salaries (which
130.2 3.0 • • • • • • •
• • • • • •
were generally accepted as fair) and the proposed
132.7 3.5 • • • • • • •

scores. 4.0 • •
133.6 • • •
Annual salary 133.9 4.5
(£), y 5.0 Perfect negative High negative
correlation No correlation
, = Some positive Perfect positive
Employee X 133.8 correlation r = -0.8 0 correlation correlation
5 750 133.3 5.5
~ ~
10 r = -1
A 6.0
55 17 300 131.9 Plotting the two regression lines, y on x and x on r 0.5 r 1
B 14 750
c 46 Jdea of the value of r. The closer the tw I" y, on a scatter dwgram can also give a good
27 8 200 You may assume that :Ex= 34, :Ey = 1057,
D dlustrated in the following diagrams: o mes are together, the nearer r is to 1 or -1. This is
17 6 350 Lxy ~ 4504.55, Lx 2 ~ 155.
E
12 6150 (a) Draw a scatter diagram of the data.
F 18 800
G 85 (b) Calculate the equation of the regression line y y
64 14 850 of yield on percentage additive and draw it
H 9 990
I 36 on the scatter diagram.
11000 "

r
0

J 40 yonx
The technician now varies the temperature ( C) .
~xand ..
30 9150
K 10 400 while keeping other conditions as constant as
37
L possible and obtains the following results ~ ~~ Y coincide
2
(You may assume that :Ex= 459, Th;: = 22 889,
.,/
.
Yield, y Temperature, t
Ly ~ 132 600, Lxy ~ 6 094 750)
(a) Plot the data on a scatter diagram. 127.6 70
(b) Estimate values that could have been used
for a and b last year by fitting the regression 128.7 75 Perfect position correlation x Strong positive strong Some positive correlation x
line y =a + bx to the data. Draw the line on 80 r=l correlation r = 0.8 r == 0.5
130.4
the scatter diagram. 131.2 85
(c) Comment on whether the suggested method

.,
133.6 90
is likely to prove reasonably satisfactory in y y y
practice. He calculates (correctly) that the regression line

j
(d) Without recalculating the regression line find xo~~
is y ~ 107.1 + 0.29t.
the appropriate values of a and b if every
employee were to receive a rise of (i) £500 a (c) Draw a scatter diagram of these data
· ... .. x on y coincides
year, (ii) 8%, (iii) 4% plus £300 per year. together with the regression line. . withy on x
(e) Two employees, Band C, had to work away (d) The technician reports as follows, 'The •.......
.:: :-r·:
·: :Yon X .
from home for a large part of the year. regression coefficient of yield on percentage .... yonx \
In the light of this additional information, additive is larger than that of yield on yonx xony
suggest an improvement to the model. temperature, hence the most effective way of
(AEB) increasing the yield is to make the
No correlation r = 0 Some negative correlation x Strong negative correlation x Perfect negative correlation x
percentage additive as large as possible, r = -0.4 r= -1
12. In a regression calculation for five pairs of r = -0.9
within reason.'
observations one pair of values was lost when Criticise the report and make your own
the data were filed. For the regression of y on x recommendations on how to achieve the
the equation was calculated as maximum yield.
y~ 2x- 0.1
There are ten pairs of data, son~ 10.
r is a very useful measure because it is independent of the units of scale of the variables. It is
x ~ Lx _ 528 _ _ Ly 666
calculated as follows. n -10-52.8 and y~--;;-~10~66.6

Using s format: To find r using small 5 format:


Lxy 38 640
where 5 :cy ,-~
-X~'~' -;\::)'! sxy= ---Xji 52.8 X 66.6 ~ 347.52
r= It 11 n 10
Lx 2 34 464
5 xx~ ---x'~ 52.8 2 ~658.56
ll n 10
IJ Ly' 46 820
' sYY =------;;-- yl 10 - 66.6 2 ~ 246.44
'
\' n
s 347.52
:. r=_21_
~ 0'
8626
sxsy '>/658.56 X '>/246.44 ...
S fonnat:
To find r using bigS format:
f:.-cc
Sx,~Lxy- LxLy
n ~38640 528x666 ~ 3475.2
10
(:Ex)' 528 2
Sxx~Lx 2 - - ~34464- --~6585 6
n 10 ·
2
2 (:Ey)' 666
s,,~Ly ---~46 820---~246 4 4
n 10 ·
. r sxy 3475.2
·· ~-~
SxSy '>/6585.6 X '>/2464.4
~o
.
8626
. ..
Example 2.6
The following table shows the marks of ten candidates in Physics and Mathematics. Find the The product moment correlation coefficient is 0 · 86 (2 s. f· )' m
. d'tcatmg
. good positive
.
correlation.
product~moment correlation coefficient and comment on your value.

Using the calculator in LR mode to find r


The value of r can be found directly, for example:

Casio 85W/85WA/570W
Set LRmode IMODEl [I) [I]
or IMODEI F.IM~o=D=EI [I] [I]
Clear memories

Input data [1]] [J I42IIDTI


@l] [J [E] IDTI

Output
r~0.826 ... [ili!TIJ 0
Clear LR mode IMODEI[IJ
NOTE: The value of r should be considered in conjunction with a diagram.
By calculation, or using the data already in your calculator, you should find that the
regression line yon x has equation y = 38.7 + 0.527x. See page 126 if you have forgotten how
to obtain this.
Also, it can be shown that the regression line x on y has equation x = -41.1 + 1.41y. Check In Example 2. 6,
this yourself on your calculator. b = 0.527, d= 1.41 so rz= b x d= 0.743 ...
The diagram shows the scatter diagram together with these two regression lines. r= Yb x d= 0.86 (2 d.p.)
As expected, since r is close to 1, the two regression lines are close together. The scatter
diagram confirms good positive correlation. Example 2.7

y Show that if r = ±1, the regression lines of


100 Illustrated in the diagrams on page 139.) yon x and x on yare identical. (This was

Solution 2. 7

The regression line y on X y = a + bx h a·


80
' , as gra 1ent b.

The regression line X on y X= c + dy h a· 1


' , as gra 1ent-
d"
Now if Y=±1
then 2
r = 1
60
Since r 2 = bd bd= 1
'
so 1
b=-
d
40 Therefore the two regression lines have the same grad· t
But you know that they both go throu h

.
be identical. g a common pomt (x, ji), so the regression lines must

0 20 40 60 80 Example 2.8
Mark in Physics

If r = 0, show that the two regression lines are at right angles.

Relationships between regression coefficients and r Solution 2.8


5
The regression line y on x has equation y = a + bx where b= xy orb= Sxy.
Since
Sxx sxx if r = 0, then sxy = 0.
sxv
The regression line x on y has equation x = c + dy ord=~.
s,,

:f:'-~'regres ion
N b sxy
ow =-,sob=O· I d sx, regression
sxx ' a so =-,sod= 0 linex ony
Now 5
or The · " •
but ;~uatwn of the regression line yon xis y =a+ bx
- 0, therefore the equation is y = a. '
lineyonx
The equation of th · . X=c
d=0 h f e regresswn 1me x on y is x = c + dy b
t ere ore the equation is x = c. ' ut
0
T
144 /-\CONCiSE. COi.mS[ \N A.-ITv'E_L_ ST,t. TiSTICS

Exercise 2b PI"Oduct-moment correlation coefficient


Important note:
The product-moment correlation coefficient r is a measure of linear correlation only. It is 1. Calcula~ethe value of the product-moment (b) Withou~ further calculation use your
important to consider the value of r in conjunction with a scatter diagram. The following co~relatwn coefficient for the following. Check correlation coefficient to explain briefly
:'h~t hedr or not you think the suggestion is
1
usmg a calculator in LR mode if possible.
example illustrates this point. Comment on your answers JUSti 1e . (L)
(a)
X 5 10 15 20 25 4. Twe!ve.stu_dents were given a prognostic test at
tee begmnmg of a course and their scores X in
y 4.3 5.9 6.9 6.5 8.2
t e t~st w:re compared with their scores y
Example 2.9 obtamed m an examination at the end of the
For each set of bivariate data, find the product-moment correlation coefficient, draw a scatter (b) course. The results were as follows:
X 12 14 16 18 20 22
diagram and then comment on your value of r.
y 100 70 86 49 60 50 Student X y
(a) 1 2

:
-2 -1 0
(c) A 1 3
1 4
\ 4 1 0 \ s 1 2 3 4 5 6 7 8 B 2 4

3 9
c 2 5
2 2 3 3 t 12.4 12.8 12.6 13.9 13.4 13.2 14 14.6

:
(b) 1 1 1 2 D 4 5
3 8 E 5 4
1 2 3 1 2 3 1 2 \ (d)
t 27 43 62 89 72
\ F 5 8
z 48 so 81 75 60 G 6 6
H 7 6
Solution 2.9 I 8 6
(a) Using a calculator for the first set of data, you should find that r = 0, indicating no linear 2. For a given set of data
J 8 7
correlation. But there could be some other relationship between the variables. l.:x ~ 680 l.:y ~ 996 l.:x 2 ~ 20 154
K 9 8
l.:y' ~ 34 670 l.:xy ~ 24 844 n ~ 30.
y L 9 10
Find the product-moment correlation coefficient.
Dete~~ine the product-moment correlation
3. The following data relate to the percentage coeff1c1ent.
You may have noticed that the unemployment and percentage change in wages
2
points all lie on the curve y = x • over several years. 5. Ten boys compete in throwing a cricket ball and
the table shows the height of each boy (x em')
There is a relationship between the % Unemployment % Change in wages ~easured to the nearest centimetre and the
variables - it is a quadratic one. (x) (y) distance (y m) to which he can throw the ball.

1.6 5.0
Boy X y
2.2 3.2
-2 -1 0
NOTE: r = 0 implies that either there is no correlation between the variables and they are 2.3 2.7 A 122 41
1.7 2.1 B 124 38
independent, or the variables are related in a non-linear way.
1.6 4.1 c 133 52
(b) Using a calculator for the second set of data, you should find that r = 0.86 (2 d.p.), 2.1 2.7 D 138 56
apparently indicating a strong degree of positive correlation. 2.6 2.9 E 144 29
1.7 4.6 F 156 54
Scatter diagram to illustrate (b) 1.5 3.5 G 158 59
y 1.6 4.4 H 161 61
8
(c/:'81
But you can see from the scatter diagram that there I 164 63
(a) Calc~l~te the product-moment correlation
is not strong positive correlation. J 168 67
6 cocff1c1ent between x and y.
4 The value of r has been distorted by the point (9, 8), (U~e l.:x ~ 18.9, l.:y ~ 35.2, :Ex'~ 37 01 Calc~l~te the product-moment correlation
XXX l.:y ~ 132.22, l.:xy ~ 64. 7) · ' coeff1c1ent.
XXX
known as an outlier.
2 C:alculate also the equations of the regression
XXX It has been suggested that low unemployment
hnes of yon x and x 011 y. (AEB)
0 and a low rate of wage inflation cannot exist
0 2 4 6 8 together. NOTE_: check your value of r by using the
So a value of r close to 1, or -1, does not necessarily imply a strong degree of linear regression coefficients obtained in the equation
correlation. Always check by referring to a scatter diagram. of the regression lines. s
Consider this example: The five finalists in the Count D
10. The body and heart masses of fourteen Red Setter, a Terrier and a Cocker Spaniel T . d y ogl Show were a Bulldog, a Poodle, a
6. The heights h, in centimetres, and weight W, in ten-month-old mice are tabulated below: preference. The dog they liked best . k ~o JU ges ran <ed the dogs in order of
kilograms, of ten people are measured. It is was ran e 1 and the results are shown in the table:
found that I:h ~ 1710, r;W ~ 760, I:h ~ 293 162,
2
Body mass Heart mass
I:h W ~ 130 628 and I:W2 ~ 59 390. (x g) (ymg) Dog
Calculate the correlation coefficient between the
118 Bulldog Poodle Setter Terrier Spaniel
values of h and W. 27
What is the equation of the regression line of W 136
30 X 1 2 3 4 5
onh? (O&C) 156 Judge
37 y 3 2 4 1 5
38 150 d =rank x- rank y d -2 0 -1 3 0
7. For a set of data, the equations of the least
32 140 d' 4 0 1 9 0 I:d' 14
squares regression lines are
155
y ~ 0.648x + 2.64 (yon x) and 36
To calculate Spearman's rank correlation coefficient, use
x ~ 0.917y -1.91 (x on y) 32 157
find the product-moment correlation coefficient 32 114 6J:,d 2
144 with n~ 5 and r,d2~ 14
for the data. 38
42 159
8. For a given set of data the equations of the least 7
36 149 So 1-10~0.3
squares regression lines are
170 5 X (25 -1)
y ~ -0.219x + 20.8 (yon x) and 44
actually derived from the pr~d t ' PI e~rman s rank correlation coefficient is
x ~ -0.785y + 16.2 (x on y) 33 131 But what does this value of r tell you? In fact S , .
38 160 uc -moment corre atwn coefficient, and is such that
Find the product-moment correlation coefficient
for the data. (a) Draw a scatter diagram of these data. l < T,
(b) Calculate the equation of the regression line
9. For a given set of data, the regression line yon x of y on x and draw this line on the scatter r s = 0.3 indicates a weak positive correlation between h .
is y"" 0.4 + 1.3x and x on y is x = -0.1 + 0.7y. way, it indicates a small degree of agree t b the two rankmgs. To put it another
diagram. men etween t e two Judges.
Find (a) the product-moment correlation (c) Calculate the product-moment coefficient of
coefficient, {b) X andy. correlation. (AEB) r s = + 1 means that the rankings are in perfect agreement.
r s = 0 means that there is no correlation between the rankings.
r s = -1 means that the rankings are in comp lete d.Isagreement. In fact they are in exact reverse
order.
SPEARMAN'S COEFFICIENT Of RANK CORRELATION, r5
To illustrate this, consider three different sets of judges at the Dog Show.
You have used the product moment correlation coefficient, r, as a measure of the strength of
the correlation between the paired data (x 1 , y 1 ), (x 2 , y 2), ... , (x,. y,.). This is reasonable FlrSt parr of judges:
provided that both x andy can be measured. Sometimes it is not possible to measure certain
Bulldog Poodle _ Setter Terrier Spaniel
variables, but it is possible to arrange them in order.
(Perfect A 1 2 3 4 5
For example, if two wine experts were asked to place six wines in order of preference, they
agreement) B 1 2 3 4 5
would rank the six wines in order, using the numbers 1, 2, 3, 4, 5, 6.
d 0 0 0 0 0
The wine they liked best would be ranked 1. d' 0 0 0 0 0 Ld 2 0
The wine they liked least would be ranked 6.
6I:d 2
It is possible to measure the strength of tbe correlation between the two rankings by using rs 1 n(n2 _ ) 1-0 = 1 and the ranldngs arc in perfect agreement.
1
Spearman's coefficient of rank correlation, r 5 •
Second pair of judges:
In general, this is obtained as follows:
Bulldog Poodle Setter Terrier Spaniel
" Assign ranks 1, 2, 3, ... , n to the values of each variable. This can be done by putting the
values in descending order or in ascending order, but whichever you choose, you must use (No c 1 2 3 4 5
correlation) D 4 1 3 5 2
the same rule for both sets of data.
e For each pair of values, calculated d where d ~rank x- rank y. d 3 1 0 -1 3
d' 9 1 0 1 9 r;d' 20
e Calculate r, using the formula
6I:d 2 6 x20
r, 1 1 - 5 x 24 = 1 - 1 = 0 and there is no correlation between 1·ankings.
n(n 2 -1)
T
i
I

It is interesting to compare the value of r s with the value of r, the product-moment correlation
Third pair of judges: coefficient.
Setter Terrier Spaniel
Bulldog Poodle Using your calculator in linear regression mode, or using the formula, you should find that
3 4 5 r=0.15 (2 d.p.).
E 1 2
(Complete 2 1
5 4 3 The two values of the correlation coefficient are very similar in this example.
disagreement) F
2 4
-4 2 0 Plotting a scatter diagram of the marks does not appear to indicate much correlation.
d 4 16 "Ed 2 ~40
16 4 0
d' y
80 X
X
- 61: dz 1 - 6 x 40 = 1 - 2 = -1 and the rankings are in exact reverse order.
= X
X
r,-1 n(n'-1) 5x24 . 60
k d ld be ositive or negative. Since you are gomg X X
NOTE: the difference between ihe ran s, ld' cou .t ~h umerical value for the difference in 40 X
to square this value to obtain d 'you cou JUSt wn e en
the table. This is written I d I' so in the table above, for Bulldog X
20
RankE- Rank F ~ 1- 5 ~ -4 so I d I~ 4 and d' ~ 16.
0
0 lQ 20 30 40 50 60 70 80 90 X

Example 2.10
Spearman's coefficient of rank correlation can be found when data have already been ranked
The marks of eight candidates in English and Mathematics are:
as in the following example.
6 7 8
1 2 3 4 5
Candidate
76 43 40 60
50 58 35 86
English (x)
54 82 32 74 40 53 Example 2.11
Mathematics (y) 65 72
e find Spearman's rank correlation coefficient between the two sets Two judges rank the eight photographs in a competition as follows:
Rancl t h e resu lts and henc
of marks. Comment on the value obtained. Photograph A B c D E F G H

1st judge 2 5 3 6 1 4 7 8
Solution 2.10 .
. h . f data so n =8 Ranking the lowest mark 1 and the highest rank 8 gtves 2nd judge 4 3 2 6 1 8 5 7
There are etg t patrs o , ·
the ranks as shown in the table. Calculate Spearman's coefficient or rank correlation for the data.
86 76 43 40 60
English (x) 50 58 35
53 Solution 2.11
82 32 74 40
Maths (y) 65 72 54
6 In this example, the data have already been ranked.
8 7 3 2
Rankx 4 5 1
2 3 Rankx 2 5 3 6 1 4 7 8
4 8 1 7
Ranky 5 6
0 3 Ranky 4 3 2 6 1 8 5 7
3 0 6 4
ldl 1 1
0 9 "Ed' 72 ldl 2 2 1 0 0 4 2 1
9 0 36 16
d' 1 1
d' 4 4 1 0 0 16 4 1 "Ed 2 ~30
6"Ed 2 6"Ed 2
rs = l n(n 2 -1) rs = 1 where n~ 8
n(n 2 -1)
6(72) 6(30)
1
~ - 8(64-l) ~1
8(64 -1)
= 0.14 (2 d.p.)
= 0.64 (2 d.p.)
Spearman's coefficient of rank correlation is 0.14 (2 d.p.). .
Spearman's coefficient of rank correlation for the data is 0.64, indicating some agreement
This appears to show a very weak positive correlation between the English and Mathematics between the judges.
rankings.
f( _(, 1-\l

Scatter diagram of data Diagram obtained by plotting the rankings


y
Note on ranking:
The masses, in kilograms, of five men, 66, 68, 65, 69, 70, ranked in ascending order of 7
6
magnitude gives 5 5
68 65 69 70 4 4
X 66
3 3
3 1 4 5
Rankx 2 2 2
1
If there are two equal values, as in 66, 68, 65, 68, 70, rank as follows: 0 .J_-Lic4-+L+J...j-li--y.:;..
0 1 2 3 4 5 6 rankx
68 65 68 70
X 66
3.5 1 3.5 5 Note that a value of r, = 1 will be obtained for
Rankx 2
any set of values for which the values of y increase as the values of .
Since the 3rd and 4th places represent the same mass of 68 kg, assign the average rank of 3.5
s· ·1 1 X mcrease.
lffil ar y r, = -1 will be obtained for any set of values fo
~~~~~--~~'- _ r w h"tc h Y d ecreases as x mcreases.
.
to both places.
Note that if there are more than just a few equal values, this method is not appropriate.
rc1se. .2 C rrnan's coeffici rank cor rei on
Care must be taken when interpreting the value of the rank correlation coefficient as
1. T~e tabl~ shows the marks awarded to six
illustrated in the following example. Recording Critics Readers (hundred)
chtldren m a competition. Calculate a coefficient
of rank correlation for the data: A 9 15
B 10 46
Child A B c D E F
c 3 58
Example 2.12
Find Spearman's rank correlation coefficient for the following data and interpret the value. Judge 1 6.8 7.3 8.1 9.8 7.1 9.2 D 32 49
Judge 2 7.8 9.4 7.9 9.6 8.9 6.9 E 30 92
4.5 3 6.5 F 25 37
1 2.5 6 7
X 10
G 17
6.5 3 2.5 5.5 2. At the end of a season a league of eight hockey
y 0.5 1 3.5 clu~s. produced the following table showing the
H 8 90
positiOn of each club in the league and the I 26 55
average attendances {in hundreds) at home
matches. falculate Spearman's rank correlation coefficient
Solution 2.12 or the data ..E~plain what your result tells you
3 6 Club Position Average Attendance (~)ut the opmwns of these critics and readers.
7 4
Rankx 1 2 5
4 3 6 A 1 27
Ranky 1 2 5 7 4. These are ~he
marks .obtained by eight pupils in
B 2 29
0 0 Mat~e~atlcs and Physics. Calculate Spearman's
0 0 0 0 0 c 3 9 coefficient of rank correlation.
\d\ D 4 16
2 E 5 24 Mathematics Physics
It is obvious that 1: d =0
F 6 15 67 70
61:d 2 G 7 12 42 59
r =1--~--
, n(n 2 -1) H 8 22 85 71
=1-0 51 38
(a) Calc~l~te the Spearman rank correlation
=1 39 55
coefficient between position in the league
and average attendance. 97 62
Spearman's coefficient of rank correlation is 1. 81 80
(b) Comment on your results. (L)
This indicates perfect agreement between the rankings, but if you draw a scatter diagram you 70 76
will see that although there is good positive correlation between the data it is not perfect since 3 · A record magazine asked critics and readers to
vote. for the 'Record of the year' from a short list Comment on your result.
the points do not lie on a straight line. In fact, if you calculate the product-moment of mne. The numbers of votes cast were as follows.
correlation coefficient you will find that r = 0.944. 5. In a skating competition one judge awards the
same. :n:ark to all four competitors. Show that the
coef~tctent of.rank correlation (Spearman's) is
0.5, trr~specttve of the marks awarded to the
competitors by the other judge.
T
11. Sketch two scatter diagrams illustrating the 13. A co.mpany is to replace its fleet of cars. Eight
8. Seven army recruits (A, B, ... , G) were give~1 ~wo following situations: posstble ~odels are considered and the transport
6. In a study of population density in eight suburbs separate aptitude tests. Their orders of ment m
of a town the statistics shown in the table were (a) two variables having a large, negative manager IS asked to rank them, from 1 to 8, in
each test were correlation; order of preference. A saleswoman is asked to
obtained. The population density is denoted by
p, and the distance of the suburb from the centre {b) two variables having a small, positive use each type of car for a week and grade them
Order of merit 1st 2nd 3rd 4th 5th 6th 7th correlation. ac~ording to their s~itability for the job (A -very
of the town by d. The mean rainfall per day and the mean number
D B C E smtable toE- unsUitable). The price is also
1st test G F A
d(km) of hours of sunshine per day observed at a recorded.
Suburb p (persons/hectare)
D F E B G C A weather station are given below.
2nd test Transport
A 55 0.7
3.8 Find Spearman's coefficient of rank correlation Rainlall Sunshine manager's Saleswoman's Price
B 11
between the two orders and comment briefly on Month (mm) (hours) Model ranking grade (£10s)
c 68 1.7
the correlation obtained. (0 & C)
D 38 2.6 January 1.26 1.1 s 5 B 611
46 1.5 9. A doctor asked ten of his patients, who were February 1.25 2.7 T 1 B+ 811
E
F 43 2.6 smokers, how many years they had smoked. In March 0.65 4.5 u 7 D- 591
21 3.4 addition, for each patient, he gave a grade .
April 2.10 5.1
v 2 c 792
G between 0 and 100 indicating the extent of thetr w 8 B+ 520
25 1.9 lung damage. The following table shows the May 2.45 5.5
H X 6 D 573
results: June 2.17 7.6 y 4 C+ 683
(a) Plot p against don a scatter diagram.
(b) Calculate and mark on the diagram the Number of years Lung damage July 2.84 5.2 z 3 A- 716
mean of the array. grade August 1.74 5.7
(c) Calculate a coefficient of rank correlation Patient smoking {a) Calculate Spearman's rank correlation
September 2.57 4.8
between p and d, stating the system of coefficient between
15 30 October 1.65 2.9
ranking adopted for both quantities. A (i) price and transport manager's rankings
22 50 November 1.47 2.8 (ii) price and saleswomenn's grades. '
(d) State what conclusions can be drawn from B
55 (b) Based on the results of (a) state, giving a
your answers to (a) and (c) concerning the c 25 December 1.94 1.8
reason, whether it would be necessary to use
general trend of the results. 28 30
D all three different methods of assessing the
(e) Giving a reason for your answer, state 57 Calculate, correct to two decimal places, the
E 31 cars.
which suburb in your opinion fits the rank correlation coefficient between rainfall and
33 35 (c) A new employee is is asked to collect further
general trend least well. (L Additional) F hours of sunshine.
36 60 What is the rank correlation coefficient between data and to do some calculations. He
G produces the following results.
7. Mr and Mrs Brown and their son John all drive 39 72 rainfall and minutes of sunshine?
the family car. Before ordering a new car th~y
H The correlation coefficient between
42 70 (~! price and boot capacity is 1.2,
decide to list in order their preferences for ftve I 12. (a) X andY were judges at a beauty contest in
optional extras independently. The rank order of 48 75 (n) maximum speed and fuel consumption
J which there were 10 competitors. Their
in miles per gallon is -0.7
their choices is as shown: rankings are shown below.
Calculate Spearman's coefficient of rank (iii) price and engine capacity is -0.9
MrBrown Mrs Brown John correlation between the number of years of y For each of his results say, giving a reason,
Optional extra Competitor X whether you think it is reasonable.
smoking and the extent of lung damage.
Comment on the figure which you obtain. (d) Suggest two sets of circumstances where
Heated rear A 4 6
2nd 3rd (C Additional) Spearman's rank correlation coefficient
window 1st B 9 10 would be preferred to the product moment
Anti-rust 10. Sketch scatter diagrams for which c 2 5 correlation coefficient as a measure of
2nd 4th 2nd (a) the product-moment correlation coefficient D 5~ 8 association. (AEB)
treatment
1st 1st is -1, E 3 1
Headrests 3rd (b) Spearman's correlation coefficient is +1, but 14.
F 10 9 Candidate A B C D E F
the product moment correlation coefficient
Inertia-reel G 5~ 7
5th 5th is less than 1. English 38 62 56 42 59 48
seat belts 4th Five independent observations of the random H 7 4
5th 3rd 4th variables X and Y were: History 64 84 84 60 73 69
Radio I 8 5

\-:--:-1--:--:---:---: I J 1 3 The table shows the original marks of six


(a) Calculate coefficients of rank correlation
candidates in two examinations. Calculate a
between each pair of members of the Brown
Calculate a coefficient of rank correlation coefficient of rank correlation and comment on
family. between these two sets of ranks and comment the value of your results.
(b) A salesman offered to supply three of these
Find briefly on your result. The History papers are re-marked and one of the
extras free with the new car. The family
(c) the sample product-moment correlation (b) Illustrate by means of two scatter diagrams six candidates is awarded five additional marks.
agreed to choose those three which were
coefficient, rank correlation coefficients of 0 and -1 Given that the other marks, and the coefficient of
ranked highest by the two members who
between two variables X and Y.
8.greed most". Whic~ three did they ch??se, (d) Spearman's correlation coefficient.
(O&C)
rank correlation, are unchanged, state, with
(C Additional) reasons, which candidate received the extra
and in whafordei? (L Addttlonal)
marks. (C Additional)
154 ,!.

® Formulae to calculate r

where sx=~ and sr=~


®l Least squares regression lines
Regression line of x on Y
Regression line of y on x
x = c+ dy r=
5
xr where S,=~ and Sr=if,:;,
y =a+ bx
I: n~ is a minimum
sxsy
I: m~ is a minimum '
' ., Relationship between r and the regression coefficients
® Useful formulae for regression and correlation work 5
BigS format Regression coefficient of yon xis b where b = sxy = xr
Small s format • sx....: s.YX
(:Ex) 2
2
Sxx =:Ex--- 5
n Regression coefficient of x on y is d where d = sxy = xr
syy syy
2 2 (:Ey)2
s
1 :Ey
=-:Ey2-ji2=--Y
-2 s
J'Y
=:Ey - -n- r can be found using r 2 = b x d
YY n
n
:Ex:Ey e Spearman's coefficient of rank correlation, r 5
1 :Exy __ S xy =:Exy---
s =-:Exy-xji=---xy n 6:Ed 2
xy n n where n is the number of pairs of values and d =rank x- rank y.
® Least squares regression line y on x is
-1 :>;;;rs::;;;; 1 where r5 =1 means that the rankings are in perfect agreement.
y=a+bx where a=y-bx
rs = -1 means that the rankings are in exact reverse order.
and

Alternatively Miscellaneous worked examples


= Xxy= Sxy
b
Y _ y- = b(x- x) where
Sxx Sxx Example 2.13
Least squares regression line x on y is
An old film is treated with a chemical in order to improve the contrast. Preliminary tests on
x=c+dy where c=x-dji nine samples drawn from a segment of the film produced the following results.
and d= Sxy= Sxy Sample A B c D E F .G H I
Syy Syy

Alternatively X 1.0 1.5 2.0 2.5 3.0 3.5 4:o 4.5 5.0
y 49 60 66 62 72 64 89 90 96
x _ x = d(y -ji) where
The quantity x is a measure of the amount of chemical applied, andy is the contrast index,
Wi Linear correlation . which takes values between 0 (no contrast) and 100 (maximum contrast).
. . . sure of the strength of the hnear
The product-moment correlation coefflClent, r' IS a mea (a) Plot a scatter diagram to illustrate the data.
correlation -1 < r < 1. (b) It is subsequently discovered that one of the samples of film was damaged and produced
• • • • • an incorrect result. State which sample you think this was .
•• • •

• • •
• • • • • In all subsequent calculations this incorrect sample is ignored. The remaining data can be
• • • • • • • •
• • • • • • • summarised as follows:
• • • • •

• • • • • • • •
• •
• • • • • • • :Ex= 23.5, :Ey= 584, :Ex 2 =83.75, :Ey 2 =44622, :Exy=1883, n=8 .
• • • • •
• Perfect positive
(c) Calculate the product moment correlation coefficient .
No correlation Some positive (d) State, with a reason, whether it is sensible to conclude from your answer to part (c) that
High negative correlation
Perfect negative {= 0 correlation
r= 1
correlation correlation r = 0.5 x and y are linearly related.
r = -1 r = -0.8
(d) Yes it is sensible to conclude that x andy are related. Since r is very close to 1, it would
. b c 1 late the values of a and b,
(e) The line of regression of yon x has equatlon Y =a+ x. a cu appear to indicate a very strong position linear correlation.
each correct to three significant figures. . d d' to the (e) For the regression line y =a+ bx, a= y- bx
(f) Use your regression equation to estimate what the contrast m ex correspon mg
damaged piece of film would have been if the piece had been undam~ged.
20.9375
. and --- = 11.38 ...
State with a reason, whether it would be sensible to use your rl~gdress!Ohn efqluat!On rtoo (C) 1.839 ...
(g) ' h h . f h mica! app Je to t e 1 m IS ze .
estimate the contrast index w en t e quanttty o c e 167.5
or ---=11.38 ...
14.38 ...
Solution 2.13 a=y-bx=73-11.38 ... x2.9375=39.57 ...
(a) Scatter diagram y = 39.6 + 11.4x (3 s.£.)
10~ u~~ !U nu (f) When x = 3.5, y = 38.57 ... + 11.38 ... x 3.5 = 79 (2 s.f.)
The contrast index would have been 79.
80
(g) No it would not be sensible to use the regression equation when x = 0, since this is outside
the range of data. Extrapolating outside the data is unreliable.
60

40
!l
0
!i!l Iii
2 3
II! 4 5 '
Example 2.14
0 The rules for a flower competition at a village fate are as follows.
(b) Sample F was damaged. Three judges each give a score out of 100 to each entry. The two judges whose rankings
23 5 LY 584 are in closest agreement are identified, and their scores for each entry are added. The
(c) x =LX= · = 2.9375 and y=-=-=73 three prize-winners are those whose total scores from these two judges are the highest.
n 8 n 8
The scores of the third judge are ignored.
To calculate r:
Using smalls format The judges awarded marks as shown in the table below.

5
=~ LXY- xy =~ x 1883-2.9375 x 73 = 20.9375 Contestant A B c D E F G
xy n 8
Judge X 89 83 80 72 69 54 41
5
=~Lx2-x 2 =~x83.75-2.9375 2 =1.839 ... Judge Y 77 84 85 65 79 72 69
XX n 8
Judge Z 73 83 89 80 67 75 69
5 =~Ly2-i=~x44 622-73 2 =248.75
" n 8
sxy 20.9375 = 0. 9787 ... The value of Spearman's rank correlation coefficient between X andY is 0.5, and between X
:. r= sxsy -,J1.839 ... -,J248.75 and Z is 0.46, correct to two decimal places. Calculate the value of Spearman's rank
correlation coefficient between judges Y and Z, and hence establish which were the three
Using big S format
LX LY 23.5 X 584
pnze-wmners. (C)
S =Lxy---=1883 167.5
xy n 8
2
(Lx) 2 23.5
2
SXX =Lx ---=83.75---=14.71
n 8
(Ly)2 584 2
S =Lyl---=44622---=1990
YY n 8
Sxy 167.5 0.9787 ...
:. r S"S y -,J14.7d1990
So r = 0.98 (2 s.f.).
(You should try this on your calculator, using LR mode.)
The diagram shows a scatter diagram of these data.
Solution 2.14 (a) Comment on the suggested model.
c D E F G (b) Suggest, giving reasons, a better model to represent the relationship between y and h
A B 3 •

7 1 5 3 2
Rank Judge Y 4 6 The new variable x 10h000 was ca lculate d an d t h e values of x andy are given in the table
7 5 1 4 2
Rank Judge Z 3 6
4 1 0 below.
1 0 0 4
ldl 16 1 0
1 0 0 16 12.5 19.5 25.0 31.4 55.1 68.1 88.5
d' X

62,d 2 y 4.43 4.88 6.31 7.18 10.63 13.60 17.95


rs = 1 n(n 2 -1)
(c) On graph paper plot a scatter diagram of y against x and comment on h th 1·
6 X 34 · 11.1ce lY to provide a suitable model for the
relationship b et ween Y an d x 1s w relationship
e era mear
d---
7x48 b etween y an d x.
= 0.39 (2 d.p.) (d) Obtain the regression line of yon x.
Spearman's rank correlation coefficients are: [You may use l,x 2 = 17 653.33 and Z,xy = 3634.185]
between X andY: 0.5, between X and Z: 0.46, between Y and Z: 0.39. (e) Estimate the weight of the baby when it was 7 5 em long. (L)
The two judges whose rankiugs are in closest agreement are X andY. So Judge Z is ignored.
Solution 2.15
Adding together the scores for X andY, the final scores are:
(a) A model of the form y = p + qh suggests a linear relationship
F G
A B c D E The graph, however, appears to suggest that the data could be modelled b
110 though a straight line might be possible. y a curve,
165 137 148 126
166 167 (b) Since the data suggest a curve, then a curve such as y = kx' or y = kex might b b
model. e a etter
The three prize winners are A, B and C.
(c) y
,,, . i.
20
~~ IJr r i 1_
1

,
i , i Lie Ui ~~
i[.·.' [.[[ . • ,.:•:···
Hi ! I •
i

u
I
Example 2.15 I: II-
I
H .. I. I· . i'
[+
~ !'
I
~20
".
~
• 5
HL I I' I
I I' I I, Ii !I] 1 · I · : ,

~ 15
• 1
, I , :·i

10

n,
H I i•
iLIT~;jj IJ i'
[J
•• 10
I •i' I··· . i

5 •• lc L [j
1-' I I I•
... u fc
!•

i .

.
0 50 100
Length {em)
l?u I t~
A mother monitored the growth of her baby and recorded the length h em and weight y kg at
various stages in the baby's development. The results were as follows.
5
11 i1 i
i; !

I
i
i n:
h 50 58 63 68 82 88 96 1=: 'il! .l i i-1 .'
1i· 1 r· 1•
IJ [J
11 '
'
i;
17.95 0
6.31 7.18 10.63 13.60 20 30 40 50 60 70 so gox
y 4.43 4.88 0 10

The scatter diagram of y against x suggests that a linear model would be a reasonabl f't
The mother thought that a model of the form (d) LX= 300.1, Z,y = 64.98 e 1.
y=P +qh Equation of regression line y on x is y =a + bx
where p and q are constants, might be suitable to describe the relationship between y and h.
160 i\ CO~-ICIS'i:" CCUHSI::. !N Pd f~I/H ST!-ITiSTiCS

(a) Calculate the equation of a suitable (d) Estimate the pH of skimmed milk at 20 oc
regression line from which a value oft can and at 95 oc. In each case indicate, with a
To find b using small s format be estimated for a given value of w. Simplify reason but without further calculation, how
~xy -- 3634.185-300.1 x 64.98 ~ 121.1999 ... your answer as far as possible, giving the reliable you think these estimates might be.
sXJ' ~--xy 7 7 7 constants correct to three significant figures. {e) Find the temperature at which you would
n (b) Use your equation to estimate the perceived expect skimmed milk to have a pH of 6.5.
~x' _, 17 653.33 _{ 300.1 \' ~ 683.944 ... temperature when the wind speed is
(i) 38 miles per hour,
(NEAB)

sxx=-;;--x 7 \ 7 } (ii) 55 miles per hour. 4. The price £x of a certain cassette recorder is
(c) Calculate the value of the product moment increased by £2 every six months. The number of
, b ~ Sxy 121.1999 "' ~ 0. 1772 ... correlation coefficient for the data, and state recorders sold during the six months before the
sxx 683.944 ... what this indicates about the data. next increase is y thousand. The values covering
(d) Comment on the reliability of the two eight consecutive periods are shown in the table.
To find b using big S format estimates found in (b). {C)
X 40 42 44 46 48 50 52 54
~x~y 300.1 X 64.98 ~ 848 . 399 ... 3. The following data were collected during a
sxy ~~xy---~3634.185
n 7 study, under experimental conditions, of the y 12.8 11.6 11.3 10.3 10.7 9.1 8.9 9.2
effect of temperature, x oc, on the pH, y, of
(~x)' (300.1) 2 skimmed milk. [Lx ~ 376, Lx 2 ~ 17 840, Ly ~ 83.9,
4787.614 ...
s XX
~~x'---~17653.33
n 7 Ly 2 ~ 893.33, Lxy ~ 3898.4.]
Temperature pH (a) Plot a scatt~r diagram for the data.
(xoC) (b) Obtain, in the form y =a+ bx, the equation
, b ~ Sxy ~ 848.399 "' 0. 1772 .. (y)
of the regression line of y on x, giving the
" sXX 7487.614 ... 4 6.85 values of a and b correct to three significant
64.98 9 6.75 figures. Plot this line on your scatter
x300.1
-- ~ 1.685 ...
a~y-- bx~--- 01772 · 7 17 6.74 diagram.
7 . 1' . 24 6.63 (c) Calculate an estimate of the number of
Giving values to three significant figures, the equation of the regresston me ts recorders sold when the price is £58, and
32 6.68
comment on the reliability of your estimate.
40 6.52
y ~ 1.69 + 0.177x. (d) Without further calculation, state whether
46 6.54 the regression line of x on y will be the same
h'
753
~ 42.1875 57 6.48 as the line plotted in part (b). Give a reason
(e) Whenh~75, x 10000 10000 63 6.36 for your answer. (C)
Whenx~42.1875, y~1.68+0.177x42.1875~9.16 (3 s.f.) 69 6.33
6.35 5. Explain, briefly, your understanding of the term
72
When the baby is 75 em long, an estimate of the weight is 9.16 kg. 'correlation'.
78 6.29 Describe how you used, or could have used,
correlation in a project or in classwork.
(a) Making reference to the following scatter
Twelve students sat two Biology tests, one
diagram for these data, explain what it
theoretical and one practical. Their marks are
reveals about the relationship between x
Miscellaneous exercise 2d andy.
shown in the table.
w t
1. A set of bivariate data can be summarised as ~ l.O ..... Marks in. theoretica~ Marks in practical
follows: 25 -"', ' i ....

..
....
0 ..... ' ...
test (T) test (P)
n~6, Lx~21, Ly~43, 5 21
I
a'· " . ..

....

Lx' ~ 91, Ly' ~ 335, Lxy ~ 171. 10 9 '' 5 6


1 '·" 9 8
(a) Calculate the equation of the regression line 15 " •
of y on x. Give your answer in the form 20 -4 ' 7 9
.. ..
y =a + bx where the values of a and b 25 -7 ' . 11 13
should be,stated correct to three significant '
......
i
30 -11 ... ...
20 20
figures. 35 -13 "'
0.0 ·•· · l
(b) It is required to estimate the value of y for a o 10 m ~ ~ ~ ~ ro M w ~ 4 9
given value of x. State circumstances under 40 -15 Temperature {xDC) 6 8
which the regression line of x on y should 45 -17 (b) Determine the equation of the least squares 17 17
be used, rather than the regression line of?~:) 50 -17 regression line of y on x.
12 14
onx. [You may make use the following
[n~11, Lw~275, r.w 2 = 9625, information. 10 8
2. It is known that the wind causes a 'chill factor', Lt~ -28, L,t 2 = 2306, Lwt ~ -3045.] 15 17
so that the human body feels the temperature to Lx ~ 511, Ly ~ 78.52, Lx 2 ~ 28 949,
Lxy ~ 3291.88] 16 18
be lower than the actual temp~rature. The
following table gives the percelVed t~mperature (c) Interpret your values for the gradient and (a) Draw a scatter diagram to represent these
(t 0 f) for different wind speeds (W IDlles per intercept of the regression line found in (b).
hour) when the actual temperature is 25 oF. data.
r
9. The government of a country considered making 11. The table below shows the names of five toy
Mean quality an investment to decrease the number of construction kits which were bought from a
(b) Find to three decimal places, the Number of items catalogue, the numbers of pieces, n, found in
score, y members of the population per doctor in order to
product-moment correlation coefficic?t. Seamstress finished, x try to reduce its infant mortality rate. (Infant each, and the corresponding prices paid, £p.
(c) Using evidence from (a) and (b) cxplam why 7.2 mortality is measured as the number of infants
a straight line regression model is 1 14 Name Set 1 Set 3 Set 4 Set 5 Set 6
7.3 per 1000 who die before reaching the age of
appropriate for these data. 2 13 five.) A study was made of several other similar
6.9 n 11 21 28 37 75
Another student was absent from the practical 3 17 countries and the variables x, population per
test but scored 14 marks in the theoretical test. 16 7.3 doctor, andy, infant mortality, were examined. p 11 26 34 41 88
4
(d) Find the equation of the appropriate 17 7.5 The data are summarised by the following
regression line and use it to estimate a mark
5 [l:n ~ 172, Lp ~ 200, Ln 2 ~ 8340,
18 7.6 statistics:
in the practical test for this student. (1.) 6 Lp 2 ~ 11 378, >:np ~ 9736.]
19 6.8
7
3.7
x ~ 440.57, jl ~ 8.00, (a) Plot a scatter diagram of the data, with non
8 32 the horizontal axis and p on the vertical axis.
6. (a) State the quantity which is minimised when 6.5 s,x~174 567.71.
using the method of least squares. Use a 9 18 (b) Calculate the equation of the regression line
7.9 (a) Calculate the equation of the regression line of p on n, and plot this line on your scatter
sketch to illustrate your answer. 10 15
6.8 ofyonx. diagram. Use your equation to estimate the
The heat output of wood is known to vary 11 15 (b) Given that the country at present has 380
with the percentage moisture content. The 19 7.1 price of Set 2, which is not listed in the
12 people per doctor, estimate the infant
table below shows, in suitabl_e units, th~ mortality.
catalogue, but is thought to have 15 pieces.
data obtained from an expenment earned Lx~213, >:y~82.6, 2:xy~1414.1, Give your answer correct to the nearest
(c) Comment on the coefficient of x in the light
out to assess this variation. >:x 2 ~ 4043, Ly 2 ~ 581.28. pound.
of the government's plans. (L) (c) Calculate the product moment correlation
(a) Calculate the value of the product-moment
Moisture content (x%) Heat output {y) correlation coefficient between x andy, and coefficient for the given data, giving your
10. Students on a French course were given an oral answer correct to three decimal places, and
5.5 interpret your value. test, a listening test and a written test. The test
50 (b) Plot these data on a scatter diagram. interpret the result in terms of your scatter
results for the eight students on the course are
8 7.4 Discuss, briefly, whether or not your diagram. (C)
given in the table. For the oral test, students were
34 6.2 interpretation in (a) should now be given a grade on a scale ranging from A, through 12. The number of hours x (correct to the nearest
22 6.8 amended. A-, B+, B etc down to D-. For the listening test half-hour) spent studying for an examination by
(c) When the results were presented at a Board they were given a mark out of 25, and for the
5.5 12 students, together with the marks y achieved in
45 meeting, the Personnel Manager ~xpl~ined written test they were given a mark out of 100.
7.1 the examination, are given in the following table.
15 that Seamstress 8 had been expenencmg
74 4.4 severe financial difficulties at home. 7 8 X y
Student 123456
3.9 Explain, briefly, the implications of this_
82 additional information on your conduslOns. 2 44
4.9 (NEAB) Oral test so
60 3
6.3 grade C- C+ B- A- C B D+ C 4 60
30
8. A purchasing manager of a _Lon?on-bas_ed 4.5 54
(b) Obtain the equation of the regression line Listening 65
company believes that the ttme m transtt of 5
for heat output on percentage moisture goods sent by road depends upon the distance test mark (x) 10 21 22 19 17 14 13 16 6 73
content, giving the values of the coefficients between the supplier and the company. In an 6.5 81
to two decimal places. attempt to measure this dependence, twelve Written
8 89
(c) Use your equation to estimate the heat packages, sent from different parts of the test mark (y) 34 76 74 60 68 44 45 53 8.5 84
output of wood with 40% moisture content. country have their transit times (y days) 90
9
State any reservations you would have about accurat~ly recorded, together with the distance I:x = 132, Lx 1 = 2296, :Ey = 454, 9.5 103
making an estimate from the regression (x miles) of the supplier from the company. The >:y 2 ~ 27 402, Lxy ~ 7909. 10 120
equation of the heat output for a 90% results are summarised as follows: (a) Calculate the value of the most appropriate
moisture content. Lx ~ 1800, >:y ~ 36.0, >:xy ~ 6438.6, measure of correlation between the results in [Lx ~ 76, L x 2 ~ 560, Ly ~ 913,
(d) Explain briefly the main implication of your the oral and listening tests, justifying your Ly 2 ~ 75 153, Lxy ~ 6425.]
>:x' ~ 336 296, 2:y 2 ~ 126.34.
analysis for a person wishing to use wood as choice of measure. Interpret the value you (a) Calculate the product moment correlation
a form of heating. (L) Obtain the least squares straight line regression coefficient r for the data.
obtain.
equation of y on x. . (b) Calculate the value of the most appropriate (b) State what the value of r indicates about the
7. In the machine sewing section of a factory Explain the significance of the regresslOn measure of correlation between the results in relation between x and y.
making high fashion clothes, a s_core _is assig?ed coefficient. the listening and written tests, justifying (c) The value of Spearman's rank correlation
to each finished item on the basts of tts quahty Predict the transit time of a package sent from a your choice of measure. Interpret the value coefficient for the above data is 0.986,
(the better the quality, the higher the score). Each supplier 200 miles away from the company. correct to three decimal places. For the next
you obtain.
seamstress's pay is, in part, dependent upon the Give two reasons why you would not use the {c) The appropriate measure of correlation examination the students each increased
number of items she finishes. The number of equation to predict transit time for a package between the results in the oral and written their study time by one hour and there was
items finished by each of 12 seamstresses on a sent from a supplier 1500 miles away. tests has a value of 0.339. Comment on the an increase of five marks in each of their
particular day and their mean quality score are Calculate the product-moment correlation indications given by the values of the three examination scores. Without further
shown. coefficient between x andy. correlation coefficients about the calculation, state whether the new value of
Explain why the value you have obtained performances of the students in the tests. the rank correlation coefficient, correct to
supports the purchasing manager's attempt to (NHAB) three decimal places, is less than, equal to or
establish a regression equation of y on x. (AEB) greater than 0.986. Give a reason for your
answer. (C)
T
In the subsequent five years the incidence of 19. For twelve consecutive months a factory
(d) Use your equation to predict the maximum death due to cancer {measured in deaths per manager recorded the number of items produced
13. Over a period of ten years a survey was done on temperature in May 1987. The a~tual . 100 000 person-years) was recorded. The data by the factory and the total cost of their
the number of cars owned per person in a maximum temperature was 15.3 C. Why IS were as follows: production. The following table summarises the
particular county. The results are given in the your predicted value so different from manager's data.
reality? (0) Area Index (x) Deaths (y)
table below. Number of items Production cost
y =no. of cars 1 7.6 62 (x) thousands (y) £1000
The following table gives the daily output of the
15. 23.2 75
per person substance creatinine from the body of each of ten 2 18 37
Year x =year -1984
nutrition students together with the student's 3 3.2 51 36 54
0 0.33 4 16.6 72 45 63
1984 body mass.
1 0.35 5 5.2 39 22 42
1985 Body mass
0.36 Output of creatinine 6 6.8 43 69 84
1986 2 (kilograms)
0.37 (grams) 7 5.0 55 72 91
1987 3 13 33
4 0.38 55
1988 1.32 [L"; ~ 67.6, Lx 2 ~ 980.08, Ly ~ 397, 33 49
0.39 48 79
1989 5 1.54 Ly ~ 23 649, Lxy ~ 4339.8.] 59
6 0.39 55 (a) Find the estimated regression line of yon x. 79 98
1990 1.45
0.40 53 (b) In another geographical area close to the 10 32
1991 7 1.06
0.41 74 power station the index of exposure was 6.0. 53 71
1992 8 2.13 Use .th~ estima~ed regression line to predict
9 0.41 44 the mctdence, m this area, of death due to (a) Draw a scatter diagram for the data.
1993 1.00
49 cancer (in deaths per 100 000 person-years). (b) Give a reason to support the use of the
0.90
You are given that:Exy= 17.76, :Ex=45 and 68 (c) Estimate the incidence of death due to regression line
2.00
Ly ~ 3.79. . . 78 cancer (in deaths per 100 000 person-years) (y-y) ~ b(x-x)
(a) Calculate the covariance of x andy, gtvmg 2.70 there would have been if there had been no
your answer correct to three decimal places. 51 as a suitable model for the data.
0.75 leak from the power station (i.e. if the index (c) Giving the values of X, y and b to three
You are also given that the variance of the of exposure to radioactivity were zero). (C) decimal places, obtain the regression
x-values is 8.25. Draw a scatter diagram for the data. equation for y on x in the above form.
(b) Calculate the equation of the regression line Calculate, correct to two decimal places, the 18. Suggest a value for the product-moment (You may use Lx 2 ~ 27 963, Lxy ~ 37 249.)
of yon x. product-moment correlation coefficient. correlation coefficient between x andy in each of (d) Rewnte the equation in the form
(c) State the value of y which the regression Comment on any relationship which is indicated the following cases.
equation found in part (b) predicted for the by the scatter diagram and the correlation (a) Y
y =a+ bx
year 2000. coefficient. (NEAB) X giving a to three significant figures.
(d) Comment on the reliability of this X X
X X (e) Give a practical interpretation of the values
prediction. (0) 16. The yield of a particular crop on a farm is X
Xx
of a and b. (L)
thought to depend principally on the amount of X X
rainfall in the growing season. The values of the
14. The table is a summary of the maximum X X 20. An electric fire was switched on in a cold room
temperature recorded in Plymouth during each of yield y, in tons per acre, and t.he rainfall x, i~ xX
X and the temperature of the room was noted at
the seven months from June to December 1986 centimetres, for seven success1ve years are g1ven five-minute intervals.
inclusive. in the table below. (b) y X X
X X Time, minutes, from
Month X
Maximum temp oc 12.3 13.7 14.5 11.2 13.2 14.1 12.0 X
X X
switching on fire, .x Temperature, oc, y
X X
X

Jun 1 22.3
y
6.25 8.02 8.42 5.27 7.21 8.71 5.68 X X 0 OA
X
2 20.2 X
X 5 1.5
Jul [Lxy ~ 654.006, Lx ~ 91, Lx ~ 1191.72,
2
17.9 X X 10 3.4
Aug 3
16.1 Ly ~ 49.56, Ly 2 ~ 362.1628] 15 5.5
Sep 4 (a) Find the linear {product-moment) (c) Y X 20 7.7
5 16.8 correlation coefficient between x andy. 9.7
Oct 25
6 12.6 (b) Find the equation of the least squares 30 11.7
Nov regression line of y on x and also that of
7 10.9 35 13.5
Dec x ony. 40 15.4
(c) Given that the rainfall in the growing season
(a) Plot a scatter diagram of the data using as of a subsequent year was 14.0 em, estimate
x coordinates the coding shown in the table You may assume that :Ex= 180, :Ey = 68.8,
and the maximum temperature as the the yield in that year. Lxy ~ 1960, Lx 2 ~ 5100.
(d) Given that the yield in a subsequent year was (d) y
y coordinate. Mark the mean point of the 8.08 tons per acre, estimate the rainfall in (a) Plot the data on a scatter diagram.
data on your graph. the growing season of that year. (C) (b) Calculate the regression line y =a+ bx and
{b) Given that :Exy = 416.7, demonstrate that the draw it on your scatter diagram.
gradient of the line of regression of y on x is 17. Following a leak of radioactivity from a nuclear (c) Predict the temperature 60 minutes from
-1.80 (to three significant figures). What is power station an index of exposure to switching on the fire.
the physical meaning of this gradient? radioactivity was calculated for each of seven Why should this prediction be treated with
(c) Calculate the full equation of regression of geographical areas close to the power station. caution?
maximum temperature on month.
Mixed test 28
Mixed test 2A 1. The average trade-in value of a particular make Give a reason why this estimate differs from the
3. Two people, X andY, were asked to give marks of used car depreciates with time according to actual number of hours of sunshine on May 5th.
1. The following table shows the amount of water, out of 20 for seven brands of fish finger. The the following table, in which the values of x may
in centimetres, applied to seven similar plots on Explain the conc.ept of least squares by reference
results are recorded in the table. be assumed to be exact. to your scatter dugram and the regression line of
an experimental farm. It also shows the yield of
F G y onx. (C)
hay in tonnes per acre. Brand A B c D E Age (x years) Value (£y thousand)

Yield of hay (y) 18 2 1 4 15 2.0 6.10 3. A car manufacturer is testing the braking
Amount of water (x) X's mark 8 10
1 19 2.5 5.55 distance for a new model of car. The table shows
5 14 12 9 4
4.85 Y'smark 3.0 5.09 the braking distance, y metres, for different
30
speeds, x km/h, when the brakes were applied.
45 5.20 Construct a table of ranks and calculate 3.5 4.65
(C) 4.5 3.89
60 5.76 Spearman's rank correlation coefficient. Speed of car,
6.60 5.0 3.51
75 x lcm/h 30 50 70 90 110 130
4. Values of x andy for a set of bivariate data are 6.0 3.31
90 7.35
given in the following table. 7.0 2.50 Braking distance,
105 7.95
y [n ~ 8, Lx ~ 33.5, LY ~ 34.6, LX 2 ~ 161.75, y metres 25 50 85 155 235 350
120 7.77 X
LY 2 ~ 160.2014, LXY ~ 130.035.] (to the nearest 5 metres)
0.1 1.97
(a) Calculate the product moment correlation
(Use Lx 2 ~ 45 675; LXY ~ 3648.75) 0.2 1.94
(a) Find the equation of the regression line of y
coefficient between x and y, and state what LX~ 480, LX 2 ~ 45 400, LY ~ 900
0.3 1.89 its value tells you about a scatter diagram LY 2 ~ 212 100, LXY ~ 94 500. ,
on x in the form y=a+ bx. 1.82
0.4 illustrating the data. (a) Plot a scatter diagram.
(b) Interpret the coefficients of your regression
0.5 1.73 (b) It _is required to estimate the value of y when (b) Calculate the equation of the regression line
line. x ts 4.0. Calculate the equation of a suitable of y on x and draw the line on your scatter
(c) What would you predict the yield to be for 0.6 1.62
line of regression, and use it to obtain the diagram.
x = 28 and for x = 150? Comment on the 0.7 1.49
reliability of each of your predicted yields. required estimate. {c) Use your regression equation to predict
0.8 1.34 (c) Interpret the gradient of the line of
(L) values of y when x = 100 and x = 150.
0.9 1.17 regression in the context of this situation. Comment, with reasons, on the likely
(d) State, with a reason in each case, whether accuracy of these predictions.
[n~9, Lx~4.5, Ly~14.97, Lx ~2.85,
2
2. In a physics experiment, a bottle of milk was
brought from a cool room into a warm room. Its
you could use your equation to obtain a (d) Disc~ss briefly whether the regression line
Ly 2 ~ 25.5309, LXY ~ 6.885.] reliable estimate of provtdes a good model or whether there is a
temperature, y oc, was recorded at t minutes (a) Calculate the product moment correlation
after it was brought in, for 11 different values (i) y when x ~ 10.0, better way of modelling the relationship
coefficient for this data and state what its (ii) x when y ~ 3.00. (C) between y and x. {MEl)
oft. The results are summarised as: value tells you about the relationship
Lt ~ 44, Lt 2 ~ 180.4, Lty ~ 824.5, between x andy. 2. The following table gives x, the number of hours 4. In the t~.o rounds of a show-jumping
LY ~ 205. of sunshine, andy, the mid-day temperature in competttwn, seven riders recorded times in
(a) Calculate the equation of the line of oc, at Springtown on the first seven days in May. seconds, given in the following table. '
regression of y on t in the form y =a+ bt. X X X Mid-day
X Hours of
(b) Explain the practical significance of the X
X temperature, yoC
Rider A B c D E F G
X Date sunshine, x
value of a.
(c) Use your equation to estimate the values of X Round 1 127 131 133 139 140 141 146
X May 1st 10 17
y at t ~ 4.5 and t ~ 20.0. May 2nd 11 21 Round 2 132 130 140 137 133 138 142
(d) State, with a reason, which of these 12
estimates is likely to be the more reliable. May 3rd 2
The experimenter plotted a graph of y against t, May 4th 7 13 (a) Calculate Spearman's rank correlation
but used only the data in the table below. May 5th 5 18 coefficient between the times for the two
May 6th 6 16 rounds.
May 7th 12 15 (b) It was subsequently discovered that rider G
Time
3.8 4.2 4.6 5 had broken the rules of the competition and
(minutes), t 3 3.4 The scatter diagram representing this data is [LX~ 53, LY ~ 112, LX 2 ~ 479, 1 0 seconds was added to his Round 2 time
shown above. LY 2 ~ 1848, I:xy ~ 882.] as a penalty. State, with a reason what can
be said about the value of Spear~an's rank
Temperature (b) State the value of Spearman's rank
(oC), Y 17 18.3 18.6 18.9 19.3 19.4 correlation coefficient for this data, and state Plot the data on a scatter diagram.
correlation coefficient calculated from the
what further information its value gives Calculate the product moment correlation revised data.
{e) Plot this graph, and on it draw the line of about the relationship between x andy. coefficient. {c) Lat~r still it was discovered that, in Round
regression. (c) State which of the following best indicates The regression line of x on y has equation 2, nders A and B had to have their times
(f) State why the linear model could not be the relationship between x andy. x = 0.607y- 2.14, and the regression line of y interchanged. State, with a reason but
valid for very large values of the time. (i) The product moment correlation on x has equation y = 0.438x + 12.7 where the without further calculation, whether, as a
(g) Using your graph, comment on whether the coefficient. c_oefficients are correct to three significant result of this change, the value of
model is a reasonable one, and state, giving {ii) Spearman's rank correlation coefficient. ftgures: Usi-?g the ~quation of the appropriate Spearman's rank correlation coefficient
a reason, whether you consider that a more (iii) The scatter diagram. regressiOn hoe, estimate the number of hours of would increase, decrease or stay the same.
refined model could be found. (L) Give a reason for your answer. {C) su.nshine expected on a day in May when the (C)
mtd-day temperature is 18 oe,
Pf~Ut3i~ L !TY 169

These probabilities can be shown on a probability scale:


1 1
4
'
25% 50% 70%

1
Winning the
1
Cutting a pack
1 1
Rain
lottery jackpot at a diamondc dom
owncommg
heads

There are different ways of assignin numbers ...


situation being considered. g to the probabil!tJes of events, depending on tbe

EXPERIMENTAL PROBABILITY
Probability
When you drop a drawing pin from a height it land .
one of two positions: point-up or point-down · sm point-up point-down

In this chapter you will learn Suppose


To you want to estimat e th e pro b a bT
do this llty t h at a drawing pin will land with point-up.

" about different ways of estimating probabilities (a) take ten identical drawing pins and drop th f h .
(b) count tbe number out of tbe ten 'th . en: rohm a eJght, say 30 em, onto a flat surface
" how to use probability notation ( ) WI pomts m t e mr '
c repeat the experiment so that it is carried out a tot I ' . .
" about the probability laws including number of 'points up' after each ti a of 20 times, notmg the cumulative
the rule for combined events (d) calculate the relative frequency of :"p;ints- u p' each time,
. w h ere
the 'or' rule for mutually exclusive events,
the 'and' rule for independent events relative frequency number of 'points-up'
total number of pins thrown
" about conditional probability
" how to use tree diagrams Here is a table showing the results when this experiment was performed.
w about arrangements, selections, permutations and combinations and their application to
probability Number of 'points-up' in Cumulative number Cumulative number Relative frequency
10 drawing pins of 'points-up' of pins thrown of 'points-up' (2 d.p.)

The probability of an event is a measure of the likelihood that it will happen and it is given on 3 3 10 fa= 0.30
a numerical scale from 0 to 1. The numbers representing probabilities can be written as 8 11 20 1b = 0.55
5 16 30 ~{ = 0.53
percentages, fractions or decimals. 5 21 40 ~=0;53
A probability of 0 indicates that the event is impossible. 7 28 50 ~=0.56
A probability of 1 (i.e. 100%) indicates that the event is certain to happen. 6 34 60 ~= 0.57
All other events have a probability between 0 and l. 6 40 70 48 = 0.57
5 45 80 ~~ = 0.56
3 48 90 ~=0.53
For example 7 55 100 No= 0.55
There is an evens chance of a coin coming down heads when tossed; 7 62 110 -tfo = 0.56
the probability is J: or 0.5 or 50%. 7 69 120 t?o= 0.58
5 74 130 ti~ = 0.57
There is a 1 in 4 chance of cutting a pack of cards at a diamond; 4 78 140 &80 = 0.56
the probability is~ or 0.25 or 25%. 8 86 150 t~6o = 0.57
The weather forecaster may say that there is a 70% chance of rain. 7 93 160 llo = 0.58
8 101 170 in=o.59
7 108 180 IZZ = o.6o
The likelihood of winning the lottery with one ticket can be shown to be approximately 7 115 190 g~ = 0.61
1
1 in 14 million so the probability is ~ 0.000 000 07. 7 122 200 !88 = 0.61
14 000 000
COUN!T.RS
The results can be illustrated on a graph. You will need a supply of counters of two different colours. Ask someone to mix them up
in a bag in a ratio known only to them.
-0- 0.6 - _. -- - -
" Use relative frequency methods to estimate the proportion of each colour in the bag. Then
~ check with the actual values to see how close your estimate was.
~8. 0.5
THREECO!NS
Toss three coins a large number of times and use relative frequency methods to estimate the
probability that on any given throw two tails and one head will be obtained.

0.2 PROBABILITY WHEN OUTCOMES ARE EQUALLY LIKELY


0.1 When asked the probability of obtaining a head when a fair coin is tossed, you would
probably give the answer:! (or 0.5 or 50%) without bothering to toss a coin a large number
90 100 llO 120130140150160170180190200 of times and working out the limiting value of the relative frequency of heads occurring.
Do 10 20 30 W 50 00 70 M Total number of pins thrown Intuitively you would have used the definition of probability that applies when the possible
outcomes are equally likely.
d mber of times, the
. hat when the experiment is repeate a 1arge nu
From the graph lt appears t I' 't' value which is around 0.6. OU[COITlC\:,
relative frequency approaches a tml mg ..
number of succc.s~fu_l outcome:;
This limiting value is taken as an estimate of the probab!hty, so
number of t)L1tC01HC~,

P(drawing pin lands point-up)= 0 ·6 and a When tossing a coin there are two possible outcomes, a head or a tail and if the coin is fair
these are equally likely to occur. Only one of the outcomes is successful (obtaining a head)
so P(head) = :l_ •
· .
event occurs r tmle:,., +,.
t J.~L
1
- ,·dativr
1·hc "

of rhi~, event. Thi;, is knovvn as tht:


SUBJECTIVE PROBABILITIES
Note that the accuracy of the estimate increases as n increases.
When you cannot estimate a probability using experimental methods or equally likely
Writing P(A) for the probability of event A: outcomes, you may need to employ a subjective method.
1'\
·1 as it For example, you may wish to estimate the probability that it will snow on Christmas Day, or
nl
the likelihood that a particular make of car will be stolen. In these cases you have to form a
. .. l hich ':_ settles as n increases indefinitely. subjective probability which you might base on past experience, such as weather records or
where 'lim' means the hmttmg va ue tow n crime figures, on expert opinion or on other factors. This method is, of course, open to error;
two people faced with the same evidence may give different estimates of the probability. It is
sometimes, however, the only method available.
Experimental probability practicals

DOMlNOES . l
Place a set of dominoes m a ar~e

out of t e ag at ran .
bag Use the relative
bability of drawing
frequenchy mbethod to;~~:~ ~:~::oes that have a
1
rn ~
~
PROBABILITY NOTATION AND PROBABILITY lAWS
When deriving mathematical rules for probability it is useful to use the definition based on
number in common on one of their ha ves. equally likely outcomes, but remember that the results hold for probability in general.
Example 3.1
you need some preliminary definitions: A group of 20 university students contains eight who are in their first year of study. A student
. . . trial has a number of possible outcomes. is picked at random to represent the group at a meeting. Find the probability that the student
Any stat!StKal expenment or . ll d the possibility space S. is not in the first year of study.
The set of all posstble outcomes ~~~~e~ to be a subset of S.
An event A of the expenment ts
Solution 3.1
Here are some examples: s Event A: student is in the first year of study.
• When a die is thrown, the outcomes are the numbers
1 to 6.
\9A
3 4 5 6

P(A) =
20
8
= 0.4
So S = (1, 2, 3, 4, 5, 6). . 1 h 3' so P(A') = 1- P(A) = 1 - 0.4 = 0.6.
Define A to be the event 'the score ts ess t an .
• s
Then A= (1, 2). 'bl 6 The probability that the student is not in the first year of study is 0.6.
When two dice are thrown, there are?~ ~osst e "
"' h
outcomes, s own
by dots on the posstbthty space j 5
4 •
Example 3.2
~:K~~~ to be the event 'the sum of th; tw~ scores
3
2 A Two fair coins are tossed. Show the possible outcomes on a possibility space diagram and find
is 6'. These outcomes are shown nnge tn t e
the probability that two heads are obtained.
diagram. 0 -l-~~,-,-,-:
0123456
First die Solution 3.2
Each coin is equally likely to to show a head or a tail. c s
. · often used to show ·g H
(H~ TH
In general terms a Venn d tagram lS The possibility space for the outcomes is shown in the diagram, u "'-
c
0 A
A and S. indicating that n(S) = 4. u
w

"'
Event A: Two heads are obtained. T HT TT
. the possibility space is denoted by n(S) ·
Tbe number o f outcomes m . d db (A) There is just one outcome for this so n(A) = 1. H T
The number of outcomes in event A IS enote y n . n(A) 1 First coin
Writing P(A) for the probability of A, Therefore P(A) = n(S) =
4
1
The probability that two heads are obtained is .
4
A is a subset of S, so 0 < n(A) < n(S).
Dividing throughout by n(S) gtVes
0< <1 3a probabil
Remember that . . 'bl 1. An ordinary die is thrown. Find the probability 3. The possibility space consists of the integers from
P(A) = 0 means that event A ts tmpo~st e, en that the number obtained is 1 to 20 inclusive.
P(A) = 1 means that event A ts certam to happ . (a) a multiple of 3,
(b) less than 7, A is the event 'the number is a multiple of 3'.
{c) a factor of 6. B is the event 'the number is a multiple of 4'.
An integer is picked at random.
The complementary event A' ~-----s
2. In a box of highlighters there are eight which Find (a) P(A), (b) P(B').
have dried up and will not write. The box
A' denotes the event A does not occur. contains 10 red, 15 blue, 5 green and 10 yellow 4. Dan carried out an experiment in which 16 coins
highlighters. were tossed together. The number of tails
n(A') = n(S)- n(A) obtained from tossing the coins was counted.
A high lighter is picked at random from the box.
, n(S)- n(A) _ _ n(A) = 1- P(A) Find the probability that
1 This procedure was carried out ten times in all
so P(A) n(S) - n(S) (a) it is blue, and the results were
{b) it is neither green nor yellow,
l - {c) it is not yellow, Number of tails: 9, 7, 8, 6, 10, 7, 5, 5, 8, 9
Therefore
_, l {d) it is purple, (a) Use Dan's data to calculate the probability
or ' d fA' (e) it will write. of obtaining a tail.
. -A is written for the complementary event instea o .
Note that sometimes
14. Two fair cubical dice are thrown simultaneously (a) Calculate (i) P(9) (ii) P(4) (iii) P(14).
hose who took part were then
The names o f t and the scores multiplied. P(n) denotes the 1
The experiment was co_ntinued until the 16 coins placed in a prize draw· . probability that the number n will be obtained. (b) If P(t) = 9'' find the possible values oft.
were each tossed 100 tunes. F d the probability th~t some<?ne who satd
(b) Calculate the total nu_mber of tails that Dan ' tn . . costs' will wm the pnze.
servtcmg
would expect to obtam.
10 The durations of 60 telephone calls arc
The probability of an event ~ccu~ring is 0?7; . summarised in the table below. ILLUSTRATING TWO OR MORE EVENTS USING VENN DIAGRAMS
5. What is the probability that It wtlt not occur.
9- 18- 27- 36- 45-
Duration (minutes) 0 Suppose A and B are two events associated with the same experiment. Consider the outcmnes
card is drawn at random from an ordinary
6. A d 6 10 21 20 3 0 described below
pack of 52 playing car s. . Number of calls
(a) Find the probability that the card drawn ts (a) AUB
Use linear interpolation t~ estimate the In set language, the set that contains the outcomes that are in A or B or both is called the
(i) the four of spades, . d
bability that the duratton of a call, d
(")
n
the four of spades or any dmmon '
d (] k or Queen or ;~~cted at random from the 60 calls, excee s (C) union of A and B and is written AU B.
(iii) not a picture ca_r ac
King) of any sutt. 30 minutes. To represent AU Bon the Venn diagram, shade the
(b) The card drawn is the three of diamon~s,- It The table summarises the results of d_ll ~he th whole of the coloured 'figure-of-eight' shape.
is laced on the table and a second car ts 11.
driving tests taken at a Test Centre unng e
p
drawn.
What is the probability that the
d. nd? first week of September.
Remember that although this outcome is written
second card drawn is not a wmo . .A or H it includes the events that are in both A and B
Female as well. A u 8 means A orB or both.
7. The pupils in a junior sch?ol cla~s w~rd_ a~~~r
Male
how many brothers. and ststers t ey a . 32 43 (b) A nB
answers are shown tn the table. Pass 15
Fail 8 In set language, the set that contains the outcomes that are in both A and B is called the
2 3 4 5 intersection of A and B and is written An B.
Number of brothers 0 1
A person is chosen at random from those who
and sisters took their test that week. To represent A n B on the Venn diagram, shade the
8 3 2 1
Number of pupils 4 12 (a) Find the probability that the person overlap of A and B. This outcome is often written
(i) passed the driving test, .. /,and H.
Find the probability that a chi}d chos~n a~ly with (ii) w~s a female who failed her dnvmg
random from the class comes rom a am A nBmeansAandB.
test.
three children.
(b) A male is chosen. What is the probability
. I die numbered 1 to 6, is weighted so that he did not pass the test? PROBABILITY RULE FOR COMBINED EVENTS
8. ~~~~t~~ is t~vice as likel~ .to occur as any other components gave the following
number. Find the probabtbty of 12. Wear tests on 100 f l'f 1 h s
grouped frequency distribution o t e engt ,
(a) a six occurring, . A 8
(b) an odd number occurnng. Number of components If the number of outcomes in A is n(A) and the number of
Life length (x hours)
. d yin which outcomes in B is n(B), then for two overlapping sets A and B,
9. A car manufacturder cha.rrhtef otuotrafrsou~;~he 15
eo le were aske w tc ac · 500 <:;X< 530 if you add n(A) and n(B) together you will count the overlap
foll~wing list influenced them most when buymg 24
530<;x<550 twice.
33
a car: 550<;x<570
A_ the colour range available,
21
570 <:;X< 600 AnB
B - the servicing costs, 7
600 <:;X< 650 So to find the number of outcomes in A U B you have to take one overlap away like this:
c- driver air bag,
D- fuel economy,
E _range of optional extras.
Use linear interpolation to estimate the dom n(A n B) ~ n(A) + n(B)- n(A n B)
bability that a component drawn at ran d
The pie chart shows the results from 90 people. r:~m the 100 has a life length between 540 an (C) Dividing by n(S), this becomes
580 hours.
A
Two ordinary unbiased dice are thrown.
13. Alternatively
B Find the probability that
) the sum on the two dice is 3, {)_J_ and lf:
(~) the sum on the two dice exceeds 9, T
(c) the two dice show the samd.e nJ.ff~:~y more Remember that the word
(d) the numbers on the two JCe 1 or means A or B or both.
than 2.
Other useful results relating two events A and B

'''K]5l
Example 3.3 3 f the 11 iris are in the athletics team. A P(A n B) = P(B n A)
In a class of 20 children, 4 of the 9 b~ys~n~ o d spo:n' race on Sports Day. Find the
person from the class is chosen to ~e m e egg an l'{A and B) PW and A.:1

'[hlSl
probability that the person chosen ts
(a) in the athletics team, P(A) = P(A n B) + P(A n B')
(b) female, . lhl
(c) a female member of the athlettcs team, T(A and B) f-'(A but not fl}
(d) a female or in the athlettcs team.

Solution 3.3
Possibility spaceS: the class of 20 people 7
P(A) = - = 0.35
, , '[Uj
AnB' AnB

P(B) = P(B n A) + P(B n A')


I
P{B and A.) l'(B but nnt A)
(a) Event A: member of the athletics team1 ~chosen, 20
. h P(F) - - = 0.5 5 BnA 8 n A'
(b) Event F: a lema1e !S c osen, - 20

m
(d)s~ P(neither A nor B) = 1 - P(A or B)
(c) P(female and in the athletics tea~)= P(A and F) 1.e. P(A' n B') = 1 - P(A u B)
There are three girls in the athletiCS team, so
3
P(A and F)= = 0.15
20 A' n 8'
(d) P(A or F)= P(A) + P(F)- P(A and F)
= 0.35 + 0.55-0.15 Example 3.5
In a survey, 15% of the participants said that they had never bought lottery tickets or a
premium bonds, 73% had bought lottery tickets and 49% had bought premium bonds.
Find the probability that a person chosen at random from those taking part in the survey
Example 3.4 4 (a) had bought lottery tickets or premium bonds,
_!! P(D) = _2-_ and P(C U D) =- · (b) had bought lottery tickets and premium bonds,
Events C and Dare such that P(C) - 30' 5 5
(c) had bought lottery tickets only.
FindP(CnD).
Solution 3.5
Solution 3.4 L: person has bought lottery tickets, P(L) = 0. 73.
P(C u D)= P(C) + P(D)- P(C n D) B: person has bought premium bonds, P(B) = 0.49. B
Using P(neither L nor B) = 0.15
~ = !!+_2-_-P(C n D) (a) P(L or B) = 1 - P(neither L nor B)
5
30 5
=1-0.15
19 2 4 =0.85 LorB Neither L nor B
P(C n D)=-+---
30 5 5
7 (b) Use P(L or B)= P(L) + P(B)- P(L and B)
0.85 = 0.73 + 0.49- P(L and B)
P(L and B)= 0. 73 + 0.49- 0.85
Land B
= 0.37

(c) P(L only)= P(L)- P(L and B)


= 0.73-0.37
=0.36
L only
Showing all the percentages on a Venn diagram:
T (b) P(reads at least one)= 1- P(reads none)
= 1 - s"o = ~6 = 0.84

S L~B (c) P(reads only one)= P(reads only A)+ P(reads only B)+ P(reads only C)
= !§+!a+ 5~ = ~ = 0.62
15~
(d) P(reads only A)= ~g = 0.32

Example 3.6 EXCLUSIVE (OR MUTUALLY EXCLUSIVE) EVENTS


P(B)- 0 4 P(A n B)= 0.1.
Events A and B are such that P (A) = 0 ·3 ' - ' '
Find (a) P(A n B'), (b) P(A' n B') Consider events, A and B, of the same experiment.
A and Bare said to be exclusive (or mutually exclusive) if they cannot occur at the same time.
For example, with one throw of a die you cannot score a three and a five at the same time, so
Solution 3.6 the events 'scoring a 3' and 'scoring a 5' are exclusive events.
(a) P(A) = P(A n B) + P(A n B') s
If A and B are exclusive, then P(A n B) = 0 since A n B is an

(b)
0.3 = 0.1 + P(A n B')
P(A n B') = 0.2
P(A' n B') = 1- P(A U B)
P(A U B)= P(A) + P(B)- P(A n B)
impossible event. There is no overlap of A and B.
For exclusive events, the rule for combined events becomes Od
= 0.3 + 0.4-0.1
=0.6 This is known as the addition rule for exclusive events.

P(A' n B') = 1 -P(A U B) It is also known as the 'or' rule for exclusive events:
= 1-0.6
=0.4
Extending this result to n exclusive events,
or
Example 3 ·7 h rs A B or c they read. The results
I l ed which of t ree newspape ' ' d C 6
A group of 50 peop e was as c d C 5 read both A and B, 4 read both B an ' Example 3.8
showed that 25 read A, 16 read B, 14 rea '
read both c and A and 2 read all 3 · In a race in which there are no dead heats, the probability that John wins is 0.3, the
probability that Paul wins is 0.2 and the probability that Mark winsis 0.4.
(a) Represent these data on a Venn diagram.
Find the probability that a person selected at random from this group reads Find the probability that
(a) John or Mark wins,
(b) at least 1 of the newspapers, (b) John or Paul or Mark wins,
(c) only 1 of the newspapers, (L)
(c) someone else wins.
(d) only A.
Solution 3.8
Solution 3.7 .
. t A B and C and fit in the numbers gtven. Since only one person wins, the events are mutually exclusive.
(a) Draw 3 overlappmg sets to represen '
s (a) P(John or Mark wins)= P(John wins)+ P(Paul wins)
A B
= 0.3 + 0.4 = 0.7
(b) P(John or Paul or Mark wins)= P(John wins)+ P(Paul wins)+ P(Mark wins)
= 0.3 + 0.4 + 0.2 = 0.9
d1c' ;pt~1\::: in d1e
s~--1--- l.\_('\ll\;m\_w.l' :q add (c) P(someone else wins)= 1-0.9 = 0.1
st:ts. j'h_is cur.ues it' shuwi:tg th~lt
s rnd none tht· i~<C\·\;s;J.=i!"-:rs.

c
T D ~,()CJ,Lr:J!ll ;--./ 181

Special case:
Consider an event A and its complementary event A'.
Example 3.9 ·· h s
A card is drawn from an ordinary pack of 52 playing cards. Find the probabtltty that t e
P(A n A')~ 0
card is A'
P(A u A')~ P(A) + P(A') ~ 1
(a) a club or a diamond,
(b) a club or a King.
event A and its complcmcntt~.ry event A' arc hoth mtJtc!ally exclusive and exhaustive.
Solution 3.9 52~------· s Extending this to n events:
Possibility space S: the pack of 52 cards, so n(S) ~ 52
n(C) 13 1 JfA A J! ••• , arc n events viihich hct\\'CCll them make up the \'vhole pt>":ibility space
C: a club is drawn, so P( C) ~ n(S) ~ 52 ~ 4 · wttlHlUI overlapping, then
n(D) 13 1 +
D: a diamond is drawn, so P(D} ~ n(S) ~ 52 ~ 4 ·
and then events arc both mutually exclusive and cxh~wstivc.
(a) Since a card cannot be both a club and a diamond, the events C and D are mutually
exclusive.
Therefore P(C or D)~ P(C) + P(D}
1 1 1
~-+- ~-
4 4 2
n(K) 4 1 3b Probability combined events
(b) Event K: a King is drawn, so P(K) ~ n(S) ~ 52 ~ 13 · 1. An ordinary die is thrown. Find the probability
that the number obtained is Faulty Not faulty
The events C and K are not mutually exclusive since a card can be both a King and a club. (a) even, (b) prime, (c) even or prime.
s Machine A 3 12
52
Therefore c K 2. In a group of 30 students all study at least one of Machine B 2 8
1 the subjects Physics and Biology. 20 attend the Machine C 5 15
P(C and K) ~ P(King of clubs)~ 52 · Physics class and 21 attend the Biology class.
Find the probability that a student chosen at
P(C or K) ~ P(C) + P(K)- P(C and K) random studies both Physics and Biology.
A component is chosen at random from those
tested.
13 4 1 16 4
~ 52+ 52- 52 ~ 52~ 13 . K~ K+ K" K" 3. From an ordinary pack of 52 playing cards the (a) Find the probability that the component
seven of diamonds has been lost. A card is dealt chosen
from the well-shuffled pack. Find the probability (i) is from Machine A,
that it is (a) a diamond, (b) a Queen, (c) a (ii) is a faulty component from Machine C,
diamond or a Queen, (d) a diamond or a seven. (iii) is not faulty or is from Machine A,
EXHAUSTIVE EVENTS (b) It is known that the component chosen is
of the m:•ssi.biJiity 4. For events A and Bit is known that P(A) "" }·,
up the P(A U B) ~ 'f and P(A n B) = f,_. Find P(B). faulty. Find the probability that it is from
H t\VO events A and B are such that het\vecn them u 1. Machine B.
1 _ - l , events
spaccl then A and h are to JC 5. For events C and D,
7. It is known that P(X) "" i and P(Y) = ~- Given
P(C) ~ 0.7, P(D U C)= 0.9, P(C n D)~ 0.3.
that X and Yare mutually exclusive, find
For example, if . . (a) P(X u Y),
S ~(the integers from 1 to 10 mclustve), Find (a) P(D),
(c) P(D n C'),
(b) P(D' n C),
(d) P(D' n C').
(b) P(Y n X), (c) P(Y n X').
A~ (the integers below 7) ~ (1, 2, 3, 4, 5 • 6), 8. For events A and Bit is known that P(A) ""P(B),
B ~(the integers above 5) ~ (6, 7, 8, 9, 10) 6. Tests are carried out on three machines A, B and P(A n B)~ 0.1 and P(A u B)= 0.7.
then Au B ~ (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) ~ S. C to assess the likelihood that each machine will Find P(A').
produce a faulty component. The results are
summarised in the table. 9. The probability that a boy in Class 2 is in the
football team is 0.4 and the probability that he is
in the chess team is 0.5. If the probability that a
boy in the class is in both teams is 0.2, find the
probability that a boy chosen at random is in the
football or the chess team.
It is also true that
16. In a large garden there are seven fruit trees a~d
10. Two ordinary dice are thrown. Find the . 13 other types of tree. Six of the trees have b~rds
probability that the sum of the scores obtamed nesting in them but only two of these are frutt
{a) is a multiple of 5, trees.
(b) is greater than 9, . (a) Copy and complete th~ table below to
(c) is a multiple of 5 or ts greater than 9, illustrate this informatton.
(d) is a multiple of 5 and is greater than 9.
Other tree Total Example 3.10
11. Given that P(A') = ~' P(B) = i and
Fruit tree
P(A n B) = f,, find P(A U B). 6 When a die was thrown the score was an odd number. What is the probability that it was a
Bird's nest 2
12. Two ordinary dice are thrown. Find the No nest prime number?
probability that Total 7 13
{a) at least one six is thrown,
(b) at least one three is thrown, . Solution 3.10
The owner of the garden has given permission
(c) at least one six or at least one three ts
for Abdul to play in the garden but_has . . P(prime and odd)
thrown. instructed him not to climb any frmt trees or P(pnme, g1ven odd)= P(odd)
trees that have birds nesting in them. Abdul
13. A and B are two events such that P(A) = fs,
P(B) = ~and P(A n B) = !. Are A and B selects a tree at random to climb. ~ ,. "fhe1·!· IJT tv-in 'l:lm;A :·~. ,·, :·1:Hi \ pr!1.:r ::~;1d l_,dci
exhaustive events? (b) Find the probabilitythat Abdul will obey
the owner's instructtons.
I i h':c:· '11·,,· 1i1:····· ,.,,;c llU:llWI :, 1, j Lild .\

14. Give two examples of events w~ich are both =j


mutually exclusive and exhausTive. Given that Abdul climbs a fruit tree,
(c) find the probability that the tree has birds
P(prime, given odd)=~
15. Two coins are tossed. A is the event 'at least one
head is obtained'. Describe an event B such that -ti~~~ ~ It is possible to deduce this straightaway, since the possibility space has been reduced to the
A and B are exhaustive events. odd numbers 1, 3, 5 and two of these, 3 and 5, are prime.

CONDITIONAL PROBABILITY Example 3.11


.1 f h experiment then the conditional
~r~b:~~li~/~a:w: ::::::: ;i~:~:~~ts~~~s ~~::atd; ~~::red, is writt~n P(A, given B) or
In a certain college
65% of the students are full-time students,
P(A I B). 55% of the students are female,
In the Venn diagram, the possibility space is reduced to just B, since B has already occurred. 35% of the students are male full-time students .
.----~=:---, s n(A n B) Find the probability that
P(A, given B)= n(B)
(a) a student chosen at random from all the students in the college is a part-time student,
n(A n B) (b) a student chosen at random from all the students in the college is female and a part-time
n(S) student,
AnB (c) a student chosen at random from all the female students in the college is a part-time
n(B)
student. (NEAB)
n(S)
P(A n B)
Solution 3.11
P(B)
Define events as follows:
F: student is female, P(F) =0.55
So M: student is male, P(M) = 1- 0.55 = 0.45
Full: student is full-time, P(Full) = 0.65
(a) P(student is part-time)= 1 - 0.65 = 0.35

Rearranging:
i B) X
PROBA.BIL!TY 185

Solution 3.13
(b) Given that 35% are male, full-time students
Events
P(M n Full) ~ 0.35
Also P(Full) ~ P(M n Full) + P(F n Full) M 1: a girl takes module M 1
0.65 ~ 0.35 + P(F n Full) M 2: a girl takes module M 2
.. P(F n Full) ~ 0.30 You are given that P(M 2 IM1) ~ .\, P(M11M,) ~ !
P(F) ~ P(F n Full) + P(F n Part) Since each girl takes one or both, P(M 1 u M 2 ) ~ 1
0.55 ~ 0.30 + P(F n Part)
. . P(Female and part-time)~ 0.25 (a) Let P(M 1 n M 2 ) ~ x

P(Part and F)
(c) P(Part, given F)
P(F)
0.25 P(M, n M1)c--- P(M,nM,i-P(M,nMJ
~-~0.45
0.55 P(M1J
P(student chosen from female students is part-time) ~ 0.45 1 X
5~ P(M 1)
P(M 1 ) ~5x
Example 3.12
P(M I M ) ~ P(M1 n M2)
X and Yare two events such that P(X I Y) ~ 0.4, P(Y) ~ 0.25 and P(X) ~ 0.2. Also
1 2 P(M2)
Find 1- X

(c) P(XU Y)
' - P(Mzl
(a) P(YIX) (b) P(X n Y) P(M 2 ) ~ 3x
P(M 1 U M 2 ) ~ P(M 1 ) + P(M 2 ) - P(M 1 n M)
Solution 3.12 But M 1 and M 2 are exhaustive events, so P(M 1 u 2 ) ~ 1M
(a) P(Y I X) x P(X) ~ P(X I Y) x P(Y) .. l~5x+3x-x

P(Y I X) X 0.2 ~ 0.4 X 0.25 1 ~ 7x


P(YIX) ~ 0.5 x=~
P(a girl is taking M 1 and M 2 ) ~ ~
(b) P(X n Y) ~ P(X 1Y) x P(Y)
~ 0.4 X 0.25 (b) P(M 1) ~ 5x ~ ~. P(M 2) ~ 3x ~ ~
~0.1 P(taking only M,) ~ P(M 1) - P(M n M )

(c)
P(Xn Y) ~ 0.1
P(X U Y) ~ P(X) + P(Y)- P(X n Y)
- 5
-7-7
=1
1 1

P(a girl is taking only M 1) ~ 1


2

2
MGIJ·M
4
7
'
7
2
7
~ 0.2 + 0.25- 0.1
~ 0.35
P(XU Y) ~ 0.35

Example 3.13 INDEPENDENT EVENTS


A group of girls at a school is entered for Advanced Level Mathematics modules.
Each girl takes only module M1 or only module M2 or both Ml and M2. If either ofthe events A and B can occur without being affected by the other then th tw
The probability that a girl is taking M2 given that she is taking M1 is .\. events are mdependent. ' e o
The probability that a girl is taking Ml given that she is taking M2 is t.
If A and B are independent, then P(A, given B has occurred) is precisely the same as P(A)
Find the probability that smce A ts not affected by B. '
i.e. P(A I B)~ P(A).
(a) a girl selected at random is taking both Ml and M2, (L) It is also true that P(B I A) ~ P(B)
(b) a girl selected at random is taking only Ml.
Example 3.16
Now, since P(A n B) ~ P(A IB) x P(B), for independent events this becomes Events A and B are independent and P(A) ~ t, P(A n B) ~ _L
X Find (a) P(B) (b) P(A u B). n·
This is the multiplication rule for independent events.
Solution 3.16
It is also known as the 'and' rule for independent events.
(a) Since A and B are independent
and H)~ x P(A n B)~ P(A) x P(B)
So there are three conditions for A and B to be independent and any one of them may be used l 1
-~-xP(B)
as a test for independence 12 3
(! X P(B)~~
4
FiBi (b) P(A U B)~ P(A) + P(B)- P(A n B)
The multiplication law can be extended to any number of independent events 1 1 1
~-+---
3 4 12
X, · X
land and""" 1
2
Example 3.14 P(AUB) ~~
2
A fair die is thrown twice. Find the probability that two fives are thrown.

Solution 3.14 Example 3.17

On one throw, P(5) ~ ~ The events A and B are such that P(A 1 B) ~ 0.4, P(B I A ) ~ 0.25, P(A n B)~ 0.12.
lnd"nndnnt cvcm~ (a) Calculate the value of P(B).
On two throws, P(5 1 and 5 2 ) ~ P(5 1 ) x P(5 2)
(b) Give a reason why A and Bare not independent
=~xi (c) Calculate the value of P(A n B'). · (L)
= 3\
Solution 3.17
P(two fives are thrown)~ 3~
(a) P(A IB) P(A n B)
P(B)
0.4 ~ 0.12
Example 3.15 P(B)
In a group of 60 students, 20 study History, 24 study French and 8 study both History
and French. Are the events 'a student studies History' and 'a student studies French' .. P(B) -- 0.12--
0.4 0.3
independent? (b) P(B IA)~ 0.25
*P(B)
Solution 3.15 A and B are not independent.
From the information given: (c) P(A) ~ P(A n B)+ P(A n B')
P(History) ~ ~ ~ !, P(French) ~~~ ~ P(History and French) ~ fo ~ f's Also P(B 1 A) P(B n A)
P(A)
Now P(History) x P(French) ~ 1, x ~ ~ fs 0.25 ~ O.l2
So P(History and French) ~ P(History) x P(French) P(A)
P(A) ~ 0.48
The two events are independent.
So 0.48 ~ 0.12 + P(A n B')
P(A n B') ~ 0.36
188 f.. CONCISE COURSE IN f..- LEVEL_ STATISTICS
T PROl-3,G.HIUT'/ 189

Alternative algebraic method:


Example 3.18 x 2 + 0.2x ~ 0.15
The events A and B are such that (x + 0.1) 2 - 0.01 ~ 0.15 (completing the square)
(x + 0.1) 2 ~ 0.16
P(A) ~ 0.45, P(B) ~ 0.35 and P(A U B)~ 0.7. x + 0.1 ~ ±0.4 (taking the square root)
(a) Find the value of P(A n B). . Either x ~ 0.3 or x ~ -0.5
(b) Explain why the events A and B are not mdependent. (L)
The negative value is impossible for a probability,
(c) Find the value of P(A IB).
sox~ 0.3

Solution 3.18 l'(A) ~ 0.3 and P(B) ~ 0.5


(a) P(A u B)~ P(A) + P(B)- P(A n B) (b) P(A u B)~ P(A) + P(B)- P(A n B)
0.7 ~ 0.45 + 0.35 - P(A n B) ~ 0.3 + 0.5-0.15
P(A n B)~ 0.8- 0.7 ~ 0.1 ~ 0.65
P(A) X P(B) ~ 0.45 0.35
X (c) Since A and Bare independent, so are A' and B'.
(b)
~ 0.1575 . . P(A' I B') ~ P(A')
~ 1-P(A)
*P(A n B)
. . A and B are not independent. ~0.7

P(A n B)
(c) P(A I B) ~ P(B)
Example 3.20
0.1
The probability that a certain type of machine will break down in the first month of operation
0.35 is 0.1. If a firm has two such machines which are installed at the same time, find the
~ 0.286 (3 d.p.) probability that, at the end of the first month, just one has broken down.
Assume that the performances of the two machines are independent.
It can be shown that if A and B are independent, then A' and B' are also independent.

For independent events A and B Solution 3.20


P(A' and B') ~ P(A') x l'(B') M 1 : machine 1 breaks down P(M 1) ~ 0.1, P(M,') ~ 0.9
M 2 : machine 2 breaks down P(M 2 ) ~ 0.1, P(M2 ') ~ 0.9
and I'! A' IB') ~ l'(A')
P(B' [A')~ P(B'l If just one machine breaks down, then
either machine 1 has broken down and machine 2 is still working (M 1 n M 2 ')
or machine 1 is still working and machine 2 has broken down (M 1' n M 2)
Example 3.19 d
02
The events A and Bare independent and are such that P(A) ~ x, P(B) ~ x + . 'an
Now M 1 and M 21 are independent, as are M 11 and M 2
so P(M 1 n M 2 ') + P(M1 ' n M 2 ) ~ P(M 1) x P(M2 ') + P(M 1') x P(MJ
P(A n B)~ 0.15. ~ 0.1 X 0.9 + 0.9 X 0.1
(a) Find the value of x. ~0.18

For this value of x, find The probability that after one month just one machine has broken down is 0.18.

(b) P(A u B), (L)


I
(c) P(A' B'). Example 3.21
Three people in an office decide to enter a marathon race. The respective probabilities that
Solution 3.19 they will complete the marathon are 0.9, 0.7 and 0.6.
(a) Using the rule for independent events Assuming that their performances are independent, find the probability that
P(A n B) ~ P(A) x P(B) (a) they all complete the marathon,
.. 0.15 ~ x(x + 0.2) (b) at least two complete the marathon.
By guesswork, x ~ 0.3, since 0.3 x 0.5 ~ 0.15.
T
I
(b) P(Et u Ez) = P(E 1 ) + P(E 2 ) - P(E n E )
I 2 1 2
Solution 3.21 = 5 + P(E 2) - P(E 1) x P(E 2)
A: the first person completes the marathon, P(A) = 0.9, P(A') = 0.1
B: the second person completes the marathon, P(B) = 0.7, P(B') = 0.3 5 2 2
g = 5 + P(Ez) - 5 P(Ez)
C: the third person completes the marathon P(C) = 0.6, P(C') = 0.4
I 9 3
(a) P(all three complete)= P(A) x P(B) x P( C) ,_ - '"""''
= 0.9 X 0.7 X 0.6 40 = 5 P(Ez)
= 0.378
I 9 5
P(E 2 )=-x-
(b) If at least two complete the marathon then either two of them do, or all three do. 40 3
P(all three complete)= 0.378 from part (a)
3
P(two complete)= P(A) x P(B) x P(C') + P(A) x P(B') x P(C) + P(A') x P(B) x P(C) 8
= 0.9 X 0.7 X 0.4 + 0.9 X 0.3 X 0.4 + 0.1 X 0.7 X 0.6
= 0.456
P(at least two complete)= 0.378 + 0.456
Example 3.23
= 0.834 Two ordinary fair dice, one red and one blue, are to be rolled once.

It is important not to confuse the terms 'mutually exclusive' and 'independent'. (a) Fmd the probabilities of the folio wmg
· events:

Mutually exclusive events are events that cannot happen together. They are usually the Event A: the number showing on the red die will b 5 6
Event B: the total of then b h . e a or a .
outcomes of one experiment. Event C: the total of the n~:b::: s howmg on thhe two dice will be 7,
b . s owmg on t e two dice will be 8.
Independent events are events that can happen simultaneously or can be seen to happen one
( ) State, wtth a reason, which two of the events A B
after the other. (c) Show that the events A and B . d d ' and Care mutually exclusive.
are m epen ent. (NEAB)
These three results are particularly useful. Learn them.
and B) Solution 3.23
(!
(a)
B c A
events w
'6
6 •
w
0
:0
5 • There are 36 equally likely outcomes, so n(S) = 36
c
0
4 •

0
u
3 •
n(A) = 12 .. P(A) = 12 =.!.
36 3
and B) ,:;- X "'
(1 B),, X
2 • n(B) = 6 .. P(B) = £36 =. .!.6

Example 3.22 0 n(C) = 5 .. P(C) = _5_
1 3 0 2 3 4 5 6 36
The three events E , E and E are defined in the same sample space. The events E and E are
1 2 3 Score on red die
mutually exclusive. The events E 1 and E 2 are independent.
Given that P(E 1) = ~. P(E 3) = t and P(E 1 U E 2) = ~.find (b) It is not possible to score 7 and 8 with one t h row of the die,
overlap. . so events B and C do not

~)P~.
(a) P(E 1 u E 3 ),
W
Events B and C are mutually exclusive.
(c) There are two ways to score 7 with the red die showing 5 or 6. These are (5 2) and (6 1)

Solution 3.22 So n(A and B)= 2 and P(A and B) = 2_ = _1_ ' ' ·
36 18
(a) Since E l and E 3 are mutually exclusive,
P(E 1 u E 3 ) = P(E 1 ) + P(E 3) But P(A) x P(B) = .!. x.!. = _1_
3 6 18
2 1
=-+- So P(A and B)= P(A) x P(B)
5 3 Events A and B are independent.
11
15
192 A CONCISE COURSE iN A~LEVEL STATISTiCS
T PROBABILITY 193

16. Two events A and Bare such that (c) Given also that P(C U D) = !. find the
P(A) ~ fs, P(B) ~ J, P(A IB) ~ t. valueofp. (0)
Exercise 3c Combined events Calculate the probabilities that
(a) Copy and complete the table. (a) both events occur, 19. Events A and Bare such that P(A) = 0.4 and
1. A number is picked at randombfro~ the dli~itf f {b) only one of the two events occurs, P(B) = 0.25. If A and Bare independent events,
1, 2, ... , 9. Given that the num er ts a ~u ttp eo Boys Girls (c) neither event occurs. (NEAB) find
3 find the probability that the number IS (a) P(A n B), (b) P(A n B'), (c) P(A' n B').
(~)even, (b) a multiple of 4. 16 8 17. All the answers to this question should be given
Passed driving test
6 either as fractions in their lowest terms or as 20. Two tetrahedral dice, with faces labelled 1, 2, 3
l. In a large group of people it is knohwn thath10%d Taken driving test, but failed decimals correct to three significant figures. and 4, are thrown and the number on which
have a hot breakfast, 20% have a ot 1unc ~· an Learning, but not yet taken a
{a) A man draws one card at random from a each lands is noted. The score is the sum of these
lS% have a hot breakfast or a hot lunch. Fmd
driving test complete pack of 52 playing cards, replaces two numbers. Find the probability that (a) the
the probability that a person chosen at random
T 00 young to take a driving test it and then draws another card at random score is even, given that at least one die lands on
from this group a three, (b) at least one die lands on a three,
{a) has a hot breakfast and a hot lunch, from the pack.
given that the score is even.
(b) has a hot lunch, given that the person Use your table to find the probability that Calculate the probability that
chosen had a hot breakfast. (L) (b) a student chosen at random has failed a (i) both cards are clubs, 21. Events C and Dare such that P(C) = 1.
J. If events A and B are such that they are . driving test, d .. (ii) exactly one of the cards is a Queen, J,
P(C n D') ~ P(CI D)~ f,.
(c) a girl chosen at random has taken a nvmg
independent and P(A) ~ 0.3, P(B) ~ 0.5, fmd (iii) the two cards are identical. Find (a) P(C n D), (b) P(D), (c) P(D I C).
(a) P(A n B), (b) P(A U B). test,
(d) a boy chosen at random has not yet taken a (b) On another occasion the man draws
Are events A and B mutually exclusive? simultaneously two cards at random from 22. Two athletes, A and B, are attempting to qualify
driving test,
(e) 2 students, chosen at random, are both too the pack of cards. for an international competition in both the
4 _ If P(A 1B) ~ }, P(B) ~ t, P(A) ~ \, find young to take a driving test, Calculate the probability that
5000 m and 10 000 m races. The probabilities of
(a) P(B I A), (b) P(A n B). (f) a boy and a girl, each chosen at random, each qualifying arc shown in the following table.
(i) exactly one of the cards is a Queen,
have both passed their driving test. (C)
5. A die is thrown twice. Find the probability of (ii) the two cards are identical. (C)
Given that two events, A and B, are such Athlete 5000 m 10 000 m
obtaining a number less than three on both
11. (a ) that I'(A and B)~ P(A) x p (B) , state w ha t
throws. 18. (a) The probability that an event A occurs is A 3 1
you can say about the events A and B. 5 4
P(A) = 0.4. B is an event independent of A
6. Events A and B are such that P(A) = ~, If event A is 'obtaining a 6 on a s~n~le throw and the probability of the union of A and B B 2
J t
P(AIB) ~ ~, P(B) ~ t· of a die', suggest a possible descnptton for is P(A u B)~ 0.7.
Find (a) P(B I A), (b) P(A n B). Assuming that the probabilities are independent,
event B. FindP(B).
calculate the probability that
7. A card is picked at random from a pac~ of ~0 (b) Given that two events, C and D, are (b) C and Dare two events such that
cards numbered 1, 2, 3, ... , 20. Given t at t ~. such that I'(C or D)~ P(C) + I'(D), state P(DIC) ~ tandP(CID) ~ j. (a) athlete A will qualify for both races,
card shows an even number, find the probabihty what you can say about the two events (b) exactly one of the athletes qualifies for the
Given that P{C n D)= p, express in terms 5000 m race,
that it is a multiple of 4. Cand D.
of p (c) both athletes qualify only for the 10 000 m
Write down the value of P(C and D). (C)
8. In a group of 100 people, 40 own a ~at, 25 own (i) P(C), (ii) P(D). race. (C)
a dog and 15 own a cat and a dog. Fmd the
probability that a person chosen at random 12. The probability that a person in a particular
evening class is left-handed is !:. From a class of
(a) owns a dog or a cat,
(b) owns a dog or a cat, but not both, 15 women and 5 men a person is chosen, ~t
random. Assuming that 'left-handedness ts
(c) owns a dog, given that he owns a cat,
(d) does not own a cat, given that he owns a dog.
independent of the sex of a person,_ find the . PROBABILITY TREES
probability that the person chosen IS a man or ts
9. A card is picked from a pack codntaidning 52 d left-handed.
A useful way of tackling many probability problems is to draw a probability tree. The method
playing cards. It is then replace. _an a secon
card is picked. Find the probabthty that 13. A and B are exhaustive events and it is known is illustrated in the following example.
(a) both cards are the seven of diamonds, t
that P(A I B) ~ and P(B) ~ 1-
Find I'(A).
{b) the first card is a heart and the second a
spade, h . 14. A bag contains four red counters and six black Example 3.24
(c) one card is from a black suit and the ot er ts counters. A counter is picked at random .from the
from a red suit,
bag and not replaced. A.s~cond counter ts then In a certain selection of flower seeds ~have been treated to improve germination and t have
(d) at least one card is a Queen. picked. Find the probabthty that . been left untreated. The seeds which have been treated have a probability of germination of
(a) the second counter is red, given that the first
0.8, whereas the untreated seeds have a probability of germination of 0.5.
10. A student investigating success in drivin~ tests counter is red,
gathered information from 60 students m her (b) both counters are red, (a) Find the probability that a seed, selected at random, will germinate.
school. Of these students, 25 were girls and 35 (c) the counters are of different colours.
were boys. She found that 37 o~ the st.udent~ had The seeds were sown and given time to germinate.
already taken a driving test, whtlst 5, mcludmg 3 15. A and B are two independent events such that
girls, were too young to take a driving test. ~f P(A) ~ 0.2 and P(B) ~ 0.15. . .. (b) Find the probability that a seed selected at random had been treated, given that it had
the 37 who had taken a test, 16 boys ~nd 8 ?Irls Evaluate the following probabiltties.
germinated. (L)
had passed their test. The remainder, mcludmg (a) P(A 1 B), (b) P(A n B), (c) P(A U B). (L)
6 girls, had failed their test.
194 ,; CONCiSE CCUf~SE iN .A.-L[\,1[:-.l__ STf\TiSriC S
T
I
(a) P(G) = P(T n G)+ P(T' n G)
Solution 3.24 =~X 0.8 + t X 0.5
=0.7
Events
T: seed is treated P(T) = ~' P(T') = 1 P(Tand G)
P(G IT)= 0.8, P(G IT')= 0.5 (b) P(T, given G)
G: seed germinates P(G)
P(Tand G)=P(T)xP(GIT) = ~x0.8 ~X 0.8
0.7
= 0.762 (3 d.p.)

P(T and G') = P(T) x P(G' IT) = ~ x 0.2 Example 3.25


A manufacturer makes writing pens. The manufacturer employs an inspector to check the
P(T' and G)= P(T') x P(G IT') = j x 0.5 quality of his product. The inspector tested a random sample of the pens from a large batch
and calculated the probability of any pen being defective as 0.025.
Carmel buys two of the pens made by the manufacturer.
(a) Calculate the probability that both pens are defective.
0. ,-
/1;
·' P(T' and G') = P(T') x P(G' IT') = 1x 0.5 (b) Calculate the probability that exactly one of the pens is defective. (C)
Treated or not Germinates or not

How to use the tree: Solution 3.25


(i) Multiply the probabilities along the branches to get the end results, so for the first D: a pen is defective, P(D) = 0.025, P(D') = 1- 0.025 = 0.975.
outcome, use the fact that P(T and G)= P(T) x P(G given T) P(D n D)= 0.025 x 0.025*
~D
.,
(ii) On any set of branches that meet at a point, the probab1ht!es must add up to
1

D~
C . i __ ')
0.8 + 0.2 = 1
,, ;;;;--.~0'
P(D n D') = 0.025 x 0.975
(iii) Check that all the end results add up to 1. .f h P(D' n D)= 0.975 x 0.025
(iv) To answer any questions find the relevant end results. If more than one sat!S Y t e
requirements, add these end results together·
~D
In practice you would usually label your tree more simply as follows.
P(T n G)=~ x 0.8 *
o·~o·
~G First pen Second pen

T~ (a) P(both pens are defective)= P(D n D)


= 0.025 X 0.025
C~G'
P(T n G') = ~ x 0.2 = 0.000 625
(b) P(exactly one pen is defective)= P(D n D') + P(D' n D)
P(T' n Gl=txo.5 = 0.025 X 0.975 + 0.975 X 0.025
= 0.048 75

Example 3.26
P(T' n G') =1x0.5 Events X and Yare such that P(X') = ?, P(YI X') = 1, P(Y' IX) = t.
Treated Germinates
or not or not By drawing a tree diagram, find
(a) P(Y) (b) P(X' I Y)
196 .A. CONCiSf_~ COUf-\St: ir'1 .A.-U~\/El_ STAriSTICS
T PF\OBABILiTY 197

Solution 3.27
Solution 3.26 Events Probabilities
Draw a tree diagram, showing event X followed by event Y, and write in all the given X: cab is from X P(X) = 0.4
probabilities. Then work out the missing probabilities using the fact that probabilities on all Y: cab is from Y P(Y) = 0.5
the branches from a point add up to 1. Z: cab is from Z P(Z) = 0.1
~r P(Xn Y)=~x.l=fo L: cab is late P(L I X)= 0.09, P(L I Y) = 0.06, P(L I Z) = 0.2
x~L P(XnL)=0.4x0.09=0.036

x~r'
0.4
~ ,. P(X n L') = 0.4 x 0.91 = 0.364*

, P(Yn L) = 0.5 x 0.06 = o.03.f

<t----"0.:'.5_ _ _ y ~
(a) P(Y) = P(X n Y) + P(X' n Y) ~ '' P(Y n L') = o.s x 0.94 = 0.47
3 1
=-+- 0.1
10 5 ~ P(ZnL)=0.1x0.2=0.02
1
2
(b) P(X' I Y) X P(Y) = P(Y I X') X P(X') z~
1 1 3
P(X'I Y) x 2:= x S ,. P(Z n L') = 0.1 x 0.8 = 0.08
3
2 (a) P(from X and not late)= P(X n L') = 0.364 c-- on dicwcnn
P(X'IY) =s (b) P(arrives late)= P(X and late)+ P(Y and late)+ P(Z and late)
Alternatively = P(X n L) + P(Y n L) + P(Z n L) c - - ;haded m dcc•g•·cnn

P(X' I Y) = P(X~~;~ Y)
= 0.036 + 0.03 + 0.02 .
= 0.086
3 1 The possibility space is now reduced to the outcomes when the cab arrives late where
-x-
5 3 P(L) = 0.086 (part b) '
1 P(Y and late)
P(from Y given it was late)
2 P(late)
2 P(Y n L)
i.e. P(YIL)
5 P(L)
= 0.03 f - - - rM.ckr:d ./in di~tg1·am

0.086
Example 3.27 = 0.349 (3 d.p.)
When a person needs a minicab, it is hired from one of three firms, X, Y and Z. Of the hirings
40% are from X, 50% are from Y and 10% are from Z. For cabs hired from X, 9% arrive
late, the corresponding percentages for cabs hired from firms Y and Z being 6% and 20%
respectively. Calculate the probability that the next cab hired BAYES' THEOREM
(a) will be from X and will not arrive late, P(Y IL) is easy to find from tbe tree diagram once you realise that the sample space has been
(b) will arrive late. reduced 7~ the, out~omes m whiCh L occurs. This is a useful method when you want to 'reverse
Given that a call is made for a minicab and that it arrives late, find, to three decimal places, the cond1t10ns, as m Example 3.27, wben you know P(L I Y) and you wanted P(Y 1 L).
the probability that it came from Y. (L)
(a) Find the probability that the second question is answered correctly.
It is interesting to write out the full formulae used: (b) By extendmg the tree dtagram, or otherwise, find the probability that the second question
IS answered correctly gtven that the third question is answered correctly. (C)
P(Y and L) = P(L I Y) x P(Y)
also P(Y and L) = P(YI L) x P(L)
so P(YI L) X P(L) = P(L I Y) X P(Y) Solution 3.28
c~c, P(C1 n C2 ) = 0.8 x 0.7 = 0.56
P(LI Y) X P(Y) ~·
P(Y IL)
P(L)
c,~
But P(L) = P(X n L) + P(Y n L) + P(Z n L) w,
= P(L I X) X P(X) + P(L I Y) X P(Y) + P(L I Z) X P(Z) P(C 1 n W2 ) = 0.8 x 0.3 = 0.24

P(L I Y) X P(Y) .,~c, P(W1 n C2 ) = 0.2 x 0.8 = 0.16


so P(YIL) P(L IX) X P(X) + P(L I Y) X P(Y) + P(L IZ) X P(Z) !).') ~·
This is an example of Bayes' Theorem, which can be written in general format as follows:

Fori-= 1, 2_, 3J ... , n P(W1 n W2 ) = 0.2 x 0.2 = 0.04


l>t

(a) P(2nd question answered correctly)= 0.56 + 0.16 = 0.72

The formula has been included here for reference. It is however easier to work from the (b) ~c, P(C1 n C2 n C3 ) = 0.8 x 0.7 x 0.6 = 0.336 *
format
.....______A~
~ ,,_ w3
c,
~~c, P(C1 nW2 nC3 )=0.8x0.3x0.7=0.168
especially when you have a tree diagram to illustrate the situation! ,_________
,:0-----_w
Example 3.28 '
A computer program generates random questions in arithmetic that children have to answer ~c, P(W1 n C2 n C3 ) = 0.2 x 0.8 x 0.7 = 0.112 *
within a fixed time. The probability of the first question being answered correctly is 0.8.
Whenever a question is answered correctly, the next question generated is more difficult, and '·~c,~
~ ''--~· w3
the probability of a correct answer being given is reduced by 0.1. Whenever a question is
answered wrongly, the next question is of the same standard, and the probability of a correct '~ ~c, P(W1 n W2 n C3 ) = 0.2 x 0.2 x 0.8 = 0.032
answer being given remains unchanged. The following tree diagram shows this information
for the first two questions generated.
1st question 2nd question
~w 3
1st 2nd 3rd

Correct P(C2 and C 3) ect.cbl d"•·cc


n ;
P(C2 1 C3 )
P(C3) ,\iwd\'d Uu1' '\buv,'
Correct
0.336 + 0.112
UY
Wrong 0.336 + 0.168 + 0.112 + 0.032
0.448
Correct 0.648
'),?,
= 0.69 (2 s.f.)
Wrong

Wrong
200 A CONCiSE COURSE iN A-U:VEL STATISTiCS

13. A team needs to win at least two of its remaining subsequent game in the series, Alec's probability
Exercise 3d Tree diagrams three games to secure the championship. The of winning the game is 0.7 if he won tl1e
probabilities that the team will win the games are preceding game but only 0.5 if he lost the
Draw a tree diagram to show all the po~~i?le assessed to be 0.6, 0.7 and 0.8, respectively. preceding game. A game cannot be drawn. Find
Section A total scores and their respective probabiiltles Calculate the probability, based on these assessed the probability that Alec will win the third game
1. The probability that I am late for work is ?·05. after a player has completed two rounds. values, that the team will secure the in the next series he plays with Bill. (NEAB)
Find the probability that, on two consecutiVe Find the probability that a player has (a) a score championship. (C)
mornings, (a) I am late for work twice, (b) I am of 4 after two rounds (b) an odd number score 18. Three men, A, Band C agree to meet at the
late for work once. after two rounds. ' (L Additional) 14. In the game of tennis a player has two serves. theatre. The man A cannot remember whether
If the first serve is successful the game continues. they agreed to meet at the Palace or the Queen's
2. A mother and her daughter both enter the cake 7. The probability that I have to wait at the traffic and tosses a coin to decide which theatre to go
competition at a show. The probability that the lights on my way to school is 0.25.
If the first serve is not successful the player serves
to. The man B also tosses a coin to decide
!
mother wins a prize is and the probability that again. If this second service is successful the
Find the probability that, on two consecutive . game continues. between the Queen's and the Royalty. The man
her daughter wins a prize is ~· mornings, I have to wait on at least one mornmg. C tosses a coin to decide whether to go to the
Assuming that the two events are independent, If both serves are unsuccessful the player has Palace or not and in this latter case he tosses
find the probability that 8. A die is thrown three times. What is the served a 'double fault' and loses the point. again to decide between the Queen's and the
(a) either the mother, or the daughter, but not probability of scoring a two on just one occasion? Gabriella plays tennis. She is successful with 60% Royalty. Find the probability that
both, wins a prize, of her first serves and 95% of her second serves. (a) A and B meet,
(b) at least one of them wins a prize. 9. A coin is tossed four times. Find the probability
(a) Calculate the probability that Gabriella (b) Band C meet,
of obtaining less than two heads.
3. In a restaurant 40% of the customers choose serves a double fault. (c) A, Band Call meet,
steak for their main course. If a customer (d) A, Band Call go to different places,
10. Two golfers, Smith and Jones, are attempting to If Gabriella is successful with her first serve she
chooses steak, the probability that he will choose qualify for a golf championship. I.t i~ est.imated has a probability of 0. 75 of winning the point. (e) at least two meet. (C)
ice cream to follow is 0.6. If he does not have that the probability of Jones quahfymg 1s 0.8, If she is successful with her second serve she has
steak, the probability that he will choose ice and that the probability of both Smith and Jones Section B
a probability of 0.5 of winning the point.
cream is 0.3. Find the probability that a qualifying is 0.6. Given that the probability of 1. I travel to work by route A or route B. The
customer picked at random will choose Smith qualifying and the probability of Jon~s. (b) Calculate the probability that Gabriella wins probability that I choose route A is !. The
(a) steak and ice cream, qualifying are independent, find the probabthty the point. (MEG) probability that I am late for work if I go via
(b) ice cream. that only one of them will qualify. (C)
15. In a group of 12 international referees there are route A is j- and the corresponding probability if
three from Africa, four from Asia and five from I go via route B is!.
4. A box contains six red pens and three blue pens. 11. Whether or not Jonathan gets up in time for (a) What is the probability that I am late for
(a) A pen is selected at random, the colour is school depends on whether he remembers to set Europe. To officiate at a tournament, three
referees arc chosen at random from the group. work on Monday?
noted and the pen is returned to the box. his alarm clock the evening before. (b) Given that I am late for work, what is the
This procedure is performed a second, then a Calculate the probability that
For 85% of the time he remembers to set the probability that I went via route B?
third time. Find the probability of obtaining clock; the other 15% of the time he forgets. (a) a referee is chosen from each continent,
(i) three red pens, (b) exactly two referees arc chosen from Asia, 2. A box contains 20 chocolates, of which 15 have
(ii) two red pens and one blue pen, in any If the clock is set, he gets up in time for school (c) the three referees are chosen from the same soft centres and five have hard centres. Two
order, on 90% of the occasions. continent. (C) chocolates are taken at random, one after the
(iii) more than one blue pen. If the clock is not set, he does not get up in time other. Calculate the probability that
(b) Repeat (a) but this time find the probabilities for school on 60% of the occasions. 16. A bag contains seven black and three white
(a) both chocolates have soft centres,
if, at each selection, the pen is not returned to marbles. Three marbles are chosen at random
On what proportion of the occasions does he get (b) one of each sort of chocolate is taken,
the box. and in succession, each marble being replaced
up in time for school? (NEAB) (c) both chocolates have hard centres, given that
after it has been taken out of the bag.
the second chocolate has a hard centre. (C)
5. Mass-produced glass bricks are inspected ~or 12. In a game, a steel ball is dropped onto a set of Draw a tree diagram to show all possible
defects. The probability that a brick has a1r nails arranged in three levels as shown. selections. 3. (a) Explain in words the meaning of the symbol
bubbles is 0.002. If a brick has air bubbles the P(A IB) where A and B are two events. State
probability that it is also cracked is 0.5 while the When a ball hits a nail, the probability of it From your diagram, or otherwise, calculate, to
the relationship between A and B when
moving right or left before reaching the next two significant figures, the probability of
probability that a brick free of air bubbles is (i) P(A IB)~ 0, (ii) P(A IB)~ P(A).
cracked is 0.005. What is the probability that a level is!-. choosing
(b) When a car owner needs her car serviced she
brick chosen at random is cracked? The (a) three black marbles, phones one of three garages, A, B, or C. Of

,Q
probability that a brick is discoloured is 0.006. (b) a white marble, a black marble and a white her phone calls to them, 30% arc to garage
Given that discolouration occurs independently marble in that order, A, 10% to B and 60% to C.
of the other two defects, find the probability that (c) two white marbles and a black marble in
The percentages of occasions when the
a brick chosen at random has no defects. (0 &C) any order,
garage phoned can take the car in on the

r1Y11Y~
(d) at least one black marble.
6. In each round of a certain game a player can day of phoning arc 20% for A, 6% forB
State an event from this experiment which and 9% for C.
score 1 2 or 3 only. Copy and complete the table together with the event described in (d) would be
which ~haws the scores and two of the respective both exhaustive and mutually exclusive. (L)
Find the probability that the garage phoned
probabilities of these being scored in a single w L.J...J L.J...J L.J will not be able to take the car in on the day
c of phoning.
round. 17. Alec and Bill frequently play each other in a series
Calculate the probability of a ball of games of table tennis. Records of the outcomes Given that the car owner phones a garage
Score 1 2 3 (a) reaching A of these games indicate that whenever they play a and the garage can take her car in on that
(b) reaching B, series of games, Alec has the probability 0.6 of day, find the probability that she phoned
Probability (c) dropping into slot C. (NEAB) winning the first game and that in every garage B. (L)
8. Of a group of pupils studying at A-level in defined as follows: from the disease is denoted by B.
4. A shop stocks tinned cat food of two makes, A schools in a certain area, 56% are boys a~d 44% X: the catch consists of two grade A balls and Evaluate (i) P(A), (ii) P(A u B),
and B, and two sizes, large and small.
are girls. The probability that a boy of th1~ _group two grade C balls (iii) P(A n B), (iv) P(A I B).
Of the stock, 70% is of brand A, 30% is of J
is studying Chemistry is and t~e probab_Ihty Y: the catch consists of two grade B balls and (b) If three different people are selected at
random without replacement, what is the
brand B. that a girl of this group 1s studymg Chemistry two other balls
Z: the catch includes the golfer's own ball probability of (i) all three having the disease,
Of the tins of brand A, 30% are small size whilst is -iT· (ii) exactly one of the three having the
of the tins of brand B, 40% are small size. (a) Find the probability that a pul?il select~d at Assuming that the catch is a random selection disease, (iii) one of the three being a female
random from this group is a gtrl studymg from the balls in the pond, determine with the disease, one a male with the disease
Using a tree diagram, or otherwise, find the (a) P(X), (b) P(Y), (c) P(Z), (d) P(Z I Y).
Chemistry. . and one a female without the disease?
probability that (b) Find the probability that a puptl sele~ted at For each of the pairs X and Y, Y and Z, state, (c) Of people with the disease 96% react
(a) a tin chosen at random from the stock will random from this group is not studymg with a brief reason, whether the two events are positively to a test for diagnosing the disease
be of small size, Chemistry. . . (i) mutually exclusive, (ii) independent. (C) as do 8% of people without the disease.
(b) a small tin chosen at random from the stock (c) Find the probability that a Chemistry puptl What is the probability of a person selected
will be of brand A. (L) selected at random from this group is male. 12. [In this question, give your answers in decimal at random (i) reacting positively, {ii) having
form, correct to three significant figures.] the disease given that he or she reacted
5. A die is known to be biased in such a way that, (You may leave your answers as fractions in their positively? (AEB)
A choir has seven sopranos, six altos, three
when it is thrown, the probability of a si_x . lowest terms.) (O&C)
tenors and four basses. The sopranos and altos
showing is t. This biased die an~ ~n ordmary fatr are women and the tenors and basses are men. 15. In an experiment two bags A and B, containing
die are thrown. Find the probability that 9. Explain, by suitably defining ~~ents A and B,. red and green marbles are used. Bag A contains
what is meant by 'the probabthty of A occurnng At a particular rehearsal, three members of the
choir are chosen at random to make the tea. four red marbles and one green marble and bag
(a) the fair die shows a six and the biased die given that B has occurred'. B contains two red marbles and seven green
does not show a six, (a) Find the probability that all three tenors are marbles. An unbiased coin is tossed. If a head
{b) at least one of the two dice shows a six, A local greengrocer sells conventionally grown
and organically grown vegetables. chosen. turns up, a marble is drawn at random from bag
(c) exactly one of the two dice shows a six, . (b) Find the probability that exactly one bass is A while if a tail turns up, a marble is drawn at
given that at least one of them shows a stx. Conventionally grown vegetables constitute 80% chosen.
(C) random from bag B. Calculate the probability
of his sales· carrots constitute 12% of the (c) Find the conditional probability that two that a red marble is drawn in a single trial. Given
convention'al sales and 30% of the organic sales. women are chosen, given that exactly one that a red marble is selected, calculate the
6. A golfer observes that, when. playing _a parti~ular Display this information in an appropriately and bass is chosen. probability that when the coin was tossed a head
hole at his local course, he htts a stratght dn_ve (d) Find the probability that the chosen group was obtained. {L)
accurately labelled tree diagram.
on 80% of the occasions when the weather 1s not contains exactly one tenor or exactly one
windy but only on 30% of the occasions when One day a customer emerges from the sh~p and bass (or both). (C)
is questioned about her purchases. What 1s the 16. In a computer game played by a single player,
the weather is windy. Local records suggest that the player has to find, within a fixed time, the
the weather is windy on 55% of all days. probability that she bought 13. Vehicles approaching a crossroads must go in path through a maze shown on the computer
(a) Show that the probability that, on a {a) conventionally grown carrots, one of three directions- left, right or straight on. screen. On the first occasion that a particular
randomly chosen day, the golfer will hit a (b) carrots? Observations by traffic engineers showed that of player plays the game, the computer shows a
straight drive at the hole is 0.525. vehicles approaching from the north, 45% turn simple maze, and the probability that the player
(b) Given that he fails to hit a straight drive at Given that she did buy carrots, what is the left, 20% turn right and 35% go straight on. succeeds in finding the path in the time allowed
the hole, calculate the probability that the probability that they were organica_lly grown_?
What assumptions have you made m answenng
Assuming that the driver of each vehicle chooses
direction independently, what is the probability

is On subsequent occasions, the maze shown
depends on the result of the previous game. If the
weather is windy. (NEAB)
this question? (0) that of the next three vehicles approaching from player succeeded on the previous occasion, the
7. In my bookcase there are four shelves and the the north next maze is harder, and the probability that the
number of books on each shelf is as shown in the 10. In a simple model of the weath~r in Oct_ober, (a) all go straight on,
player succeeds iS one half of the probability of
table: each day is classified as either fmc or ramy. !he success on the previous occasion. If the player
probability that a fine day is follow~d by a ~me (b) all go in the same direction,
failed on the previous occasion, a simple maze is
(c) two turn left and one turns right,
Hardback Paperback day is 0.8. The probability that a ramy d~y. IS shown and the probability of the player
(d) all go in different directions,
followed by a fine day is 0.4. The probabthty succeeding is again l
(e) exactly two turn left?
11 9 that 1 October is fine is 0.75.
Shelf 1 The player plays three games.
Shelf 2 8 12 (a) Find the probability that 2 October is fine Given that three consecutive vehicles all go in the
and the probability that 3 October is fine. (a) Show that the probability that the player
Shelf3 16 4 same direction, what is the probability that they
(b) Find the conditional probability that all turned left? (AEB) succeeds in all three games is }?1 •
Shelf 4 9 3 3 October is rainy, given that 1 October is (b) Find the probability that the player succeeds
fine. in exactly one of the games.
14. During an epidemic of a certain disease a doctor (c) Find the probability that the player does not
(a) If I choose a book at random, irrespective of {c) Find the conditional probability that is consulted by 110 people suffering from have two consecutive successes.
its position in the bookcase, what is the 1 October is fine, given that 3 October is symptoms commonly associated with the
probability that it is a paperback? rainy. (C) (d) Find the conditional probability that the
disease. Of the llO people, 45 are female of player has two consecutive successes given
(b) I am equally likely to choose any shelf. I whom 20 actually have the disease and 25 do
choose a shelf at random and then choose a that the player has exactly two successes. (C)
11. At the ninth hole on a certain golf course there is not. Fifteen males have the disease and the rest
book. (i) What is the probability t~at it is a a pond. A golfer hits a grade B ball into the do not.
hardback? (ii) If the book chosen 1s a pond. Including the golfer's ball there are then 17. A sailing competition between two boats, A and
hardback, what is the probability that it is six grade C, ten grade B and four grade A balls (a) A person is selected at random. The event B, consists of a series of independent races, the
from shelf 3? in the pond. The golfer uses a fishing net and that this person is female is denoted by A competition being won by the first boat to win
'catches' four balls. The events X, Y and Z are and the event that this person is suffering three races. Every race is won by either A or B,
204 /l, CONC!SE COURSE IN ,~-LE')EL ST,t:... TiSTiCS
T
'

Given that the first race was won by A, Divide both sides by log(i). Since log(~) is negative, this will reverse the inequality sign.
and their respective probabilities of winning are determine the conditional probability that
influenced by the weather. In rough weather the log(O.Ol)
probability that A will ~in is. 0._9; in fine weather (a) the weather for the fir~t. race was rough, n;,
the probability that A wtll wm IS 0.~. For each (b) A will win the competitiOn. (C) log(il
race the weather is either rough or fme, the n;.25.3 ...
probability of rough weather.bei~g 0.2. ?how
that the probability that A will wm the first race The least value of n is 26
is 0.5.

(b) Problems involving the use of an infinite geometric progression


SOME USEFUL METHODS (GP)

(a) Problems involving an 'at least' situation Many probability examples involve the use of GPs and the following formula is required.
If s~ =a+ ar + ar 2 + ar 3 + ... (to infinity),
Example 3.29 then
(a) Find the probability of obtaining at least one six when five. dice are thrown.
a
(b) Find the probability of obtaining at least one s1x when n diCe a~e throw~. . . S = -- for Ir I < 1 where a is the first term and r is the common ratio
(c) How many dice must be thrown so that the probability of obtatmng at east one s1x lS at = 1-r
least 0.99?
Example 3.30
Solution 3.29 Joe and Pete play a game in which they each throw a die in turn until someone throws a six.
The person who throws the six wins the game. Joe starts the game. Find the probability that
(a) In one throw P(6) = ~and P(not 6) =~
he wins.
When five dice are thrown,
P(at least one six) = 1 - P(no sixes)
= 1- (~)'
Solution 3.30
= 0.598 (3 d.p.) Joe will win the game if he wins on his first go, or on his second go, or on his third go, and so
on.
(b) When n dice are thrown,
P(at least one six) = 1 - (~)" P(Joe wins on his first go)=~
(c) You need to find n such tbat P(Joe wins on his second go)= P(Joe doesn't throw a six, Pete doesn't throw a six,
then Joe throws a six)
1- (~)" " 0.99 =~X~ (~)2 Xi= Xi
1.e. <il" <: O.ol
You could do this by trial and improvement:
P(Joe wins on his third go)= i x ~xi xi x l; = (i) 4 x l; and so on
P(Joe wins)= l; + (~) 2 x l; + (il 4 x (tl + .. ·
(~) 20
= 0.026 ... > O.ol
= ~(1 + (~)2 + (~)4 + ... )
(i) 25 = 0.0104 ... > O.ol
<iJ 26 = 0.0087 ... < 0.01 Now 1 + (~) 2 + (~) 4 + · · · is the sum of an infinite GP with a= 1, r = (~) 2 = *·
a 1
s~ = - - = -----zs =II .. P(Joe wins) =~X#= i'r
36 •
So the least value of n is 26.
. . 26 dice must be thrown. 1-r 1-36

NOTE: you could solve (%)" <: 0.01 using logarithms.


Take logs to the base 10 of botb sides,
n log(%) <: log(0.01)
T
!

The first letter can be chosen in four ways (either A orB or Cor D)
Exercise 3e Usefui methods the se~ond letter can be chosen in three ways, '
7. A, B, C and D throw a coin, in turn, starting the th1rd letter can be chosen in two ways,
1. A coin is biased so that the probability that it with A. The first to throw a head wins. The the fourth letter can be chosen in only one way.
falls showing tails is 0.75. game can continue indefinitely until a head is
(a) Find the probability of obtaining at least one thrown. However, D objects because the others Therefore the number of ways of arranging the four letters is 4 x 3 x 2 x 1 = 4! = 24.
head when the coin is tossed five times.
(b) How many times must the coin be tossed so
have their first turn before him.
Compare the probability that D wins with the
On a calculator: GJ ~ (You may have to use ISHIFT I key.)
that the probability of obtaining at least one
probability that A wins. The arrangements are
head is greater than 0.98?
2. A missile is fired at a target and the probability 8. A box contains five black balls and one white ABCD ABDC ACBD ACDB ADCB ADBC
baiL Alan and Bill take turns to draw a ball from
that the target is hit is 0.7. BCDA BCAD BDAC BDCA BACD BADC
the box, starting with Alan. The first boy to
(a) Find how many missiles should be fired so draw the white ball wins the game. CDBA CDAB CABD CADB CBAD CBDA
that the probability that the target is hit at DABC DACB DBCA DBAC DCAB DCBA
least once is greater than 0.995. Assuming that they do not replace the balls as
(b) Find how many missiles should be fired so they draw them out, find the probability that Bill
that the probability that the target is not hit wins the game.
Example 3.31
is less than 0.001. If the game is changed, so that, in the new game,
they replace each ball after it has been drawn A witness reported. that a car seen speeding away from the scene of the crime had a number
3. A die is biased so that the probability of out, find the probabilities that:
obtaining a three is p. When the die is thrown plate that began w1th V or W, the digits were 4, 7 and 8 and the end letters were A, c, E. He
four times the probability that there is at least (a) Alan wins at his first attempt; could not however remember the order of the digits or the end letters. How many cars would
one three is 0.9375. Find the value of p. (b) Alan wins at his second attempt;
(c) Alan wins at his third attempt.
need to be checked to be sure of including the suspect car?
How many times should the die be thrown so
that the probability that there arc no threes is Show that these answers are terms in a
less than 0.03? Geometric Progression. Hence find the Solution 3.31
probability that Alan wins the new game.
4. On a safe there are four alarms which are There are 3! ways of arranging the digits 4, 7, 8 and
arranged so that any one will sound when
9. Two archers A and B shoot alternately at a 3! ways of arranging the letters A, C, E.
someone tries to break into the safe. The
target until one of them hits the centre of the There are two choices for the initial letter.
probability that each alarm will function
properly is 0.85, find the probability that at least target and is declared the winner. The total number of different plates= 2 x 3! x 3!
one alarm will sound when someone tries to Independently, A and B have probabilities of =72
break into the safe. t and~. respectively, of hitting the centre of the 72 cars would need to be checked.
target on each occasion they shoot.
5. For a certain strain of wallflower, the probability
(a) Given that A shoots first, find (i) the
that, when sown, a seed produces a plant with
probability that A wins on his second shot,
yellow flowers is k· Find the minimum number of
(ii) the probability that A wins on his third
seeds that should be sown in order that the
probability of obtaining at least one plant with
shot, (iii) the probability that A wins. Result 2
(b) Given that the archers toss a fair coin to
yellow flowers is greater than 0.98. (L)
determine who shoots first, find the
probability that A wins. (NEAB)
n!
6. Two people, A and B, play a game. An ordinary The number of \\'ays of in a line It objects, of vv-hlch p arc lS
die is thrown and the first person to throw a four p!
wins. A and B take it in turns to throw the die,
starting with A. Find the probability that B wins. If instead of the letters A, B, C, D you have the letters A, A, A, D then the 24 arrangements
hsted prev10usly reduce to the following:
AAAD AADA ADAA DAAA
ARRANGEMENTS
So the number of ways of arranging the four objects, of which three are alike
In order to calculate the number of possible outcomes in a possibility space or an event, the
4! 4 X 3X2X 1
following results are often used.
3!
= 4. On a calculator: GJ ~ G [I] ~ 0
The result can be extended as follows:
Result 1 T'hc number of \\'a)'S of arra,nvinP ln a linen orrJcm of vvhicb p of one type arc q of a
r of a third type arc and so on, is
p!q!r! ...
NOTE: n! = n x (n- 1) x (n- 2) x ... x 3 x 2 x 1.
For example, consider the letters A, B, C, D.
208 A CONCIS[ COUf~S[ IN i\-LEVEL. STATISTICS
T PROBABILiTY 209

The two youngest can be arranged in two ways ( y 1 y 2 or y 2 y 1 )•


Example 3.32 Therefore n(E) = 2 x 9!
(a) In how many ways can the letters of the word STATISTICS be arranged?
(b) If the letters of the word MINIMUM are arranged in a line at random, what is the So P(E) = n(E) On a calculator:
n(S)
probability that the arrangement begins with MMM?
2 X 9!
---
10!
Solution 3.32 0.2
=

(a) Consider the word STATISTICS. El is the event 'the two youngest are not together'.
P(E') = 1 - P(E)
There are ten letters and S occurs three times,
T occurs three times, = 1-0.2 = 0.8
I occurs twice. The probability that the two youngest are separated is 0.8.
10!
Therefore number of ways=-- = 50 400
3!3!2!
On a calculator: [Q] §] G [I] §] G [I] §] G m§] G Example 3.34

find the probability,that the number is divisfble b~


If a four-digit number is formed from the di its 1 2 3 ..
There are 50 400 ways of arranging the letters in the word STATISTICS. S.
and 5 and repetltlons are not allowed,

(b) Consider the word MINIMUM.


Solution 3.34
The possibility spaceS= (arrangements of MINIMUM).
LetS be the possibility space, then n(S) = 4! = 24_
7!
n(S) = - = 420 Let E be the event 'the number is divisible by 5'.
3!2!
! T If the number is divisible by 5 then it must end with the digit 5.
:"1 Ms '2 ]s
n(E) =number of ways of arranging the digits 1, 2, 3 = 3!
Let E be the event 'the arrangement begins with MMM'.
So P(E) = n(E)
The letters must be arranged in the order MMMxxxx. There is only one way of arranging n(S)
4!
3!
MMM; then the remaining four letters can be arranged in -2! = 12 ways.
24
1
n(E) = 12
n(E) 12 1 4
So P(E) = n(S) = 420 = 35 The probability that the number is divisible by 5 is t.
The probability that the arrangement begins MMM is f,.
Example 3.35
The letters
cards of the
are laid out word MATHEMATICS are wntten,
in a line. . one on each of 11 separate cards. The
Example 3.33
Ten pupils are placed at random in a line. What is the probability that the two youngest
(a) Calcula:e the number of different arrangements of these letters
pupils are separated? (b) Determme the probability that the vowels are all placed togeth~r. (L)

Solution 3.33 Solution 3.35


Let the possibility space be S, then n(S) = 10! (a) Number of different arrangements
11!
Let E be the event 'the two youngest pupils are together'. 4 989 600
2! X 2! X 2!
Treating these two together as one item, there are nine items to arrange.
2T~
Nine items can be arranged in 9! ways.
210 ~\
"'c-
CONChr-
.r.·.c1,., 1·, 1st: iN i\-Lt_\lt::!_ ST.td!Sric:;

(c) If two Os and two Ns are adjacent then it is easier to think of each pair being glued
\ AEAI \ as one item. So treat together like this (]) and m
(b) To find the number of wac with\ vowels together treat Number of different arrangements of L, ®, NN, D = 4! = 24
M T C s and AEAI as 8 ttems. 24 2
M' T ' H ' ' ' ' .. P(two Os, two Ns are adjacent)=--=-
8! 180 15
nts - - - = 10 080
Then numb er o.f arrangeme - 2!2!
(d) If the first two cards are L, 0 then you need to find the number of different arrangements
ofN,D,O,N
4! 4'
Number of arrangements=--'.= 12
A I however can b e arra nged in-= 12 ways, 2!
Th e vowe1s A, E, , ' ' 2 ,.
12 10 080 = 120 960 2 Ns
so total number of arrangements = x
- 120 960 0.024 (2 s.f.) It is quite easy to list these arrangements
P(vowels together) - 989 600
4
*NOON DNNO
NDNO DNON
NOON DONN
NOND ONND
Example 3.36 f h d LONDON are each written on a card and the six cards are then NNDO ONDN
The stx letters o t e wor NNOD ODNN
shuffled and placed in a line.
b f different arrangements. h Of course, only one of these marked (*)will spell LONDON
(a) Calculate the num er o . d both have the letter N on t em.
(b) Find the probability that the mtddle ~wo ca~ Is tt r 0 are adjacent and the two cards with So P(L, 0 and four remaining letters spell LONDON)= .2_
(c) Find the probability that the two car s Wlt e e 12

Thel:::~s~~r::~;~e~::::t~nd placed in a line, face down. The first two cards in the line are
turned over and reveal the letters Land 0. . 11
(d) Find the probability that when the other four cards are turned over the letters w!ll spe (L)
Result 3
rhc number of ,,.. ays of 11 unlike uu 1cc" m a '>'vhcn clock\vlsc and antidock•lvisc
LONDON.
~ arJ.'angcmcnts
arc different is ljt- 1 )!

Solution 3.36 For example, consider four people A, B, C and D, who are to be seated at a round table. The
6! following four arrangements are the same, as A always has D on his iinmediate right and B on
(a) Number of different arrangements of LONDON= 2! x 2! = 180 his immediate left.
/'
2 0•
"' 2 N•
d to find the number of different
(b) If the middle two letters are NN, then you nee
arrangements of LODO.
4!
Number of arrangements= 21 = 12
To find the number of different arrangements, fix A and then consider the number of ways of
2 Os
12 1 arranging B, C and D.
P(middle two letters are NN) = 180 = 1s Therefore the number of different arrangements of four people around the table is 3!
T PRCE~l\BiliTY 213
212 A COi\iC!SE CCJ!_JRS[ !N /'\-I__ EVF\ ST.i TIS-f!CS

Example 3.38
Result 4 I One white, one blue, one red and two yellow beads are thread d .
Fmd the probability that the red and white b d e on a rmg to make a bracelet.
ea s are next to each other.
"vhcn and
The number of vvays of n unlil<:c objects in a
(n- I)!
antidock\visc arr:angcn1cnts arc the same, -is '- ---~--- Solution 3.38
2
Let S be the possibility space.
For example, if A, B, C and D are four different coloured beads which are threaded on a ring,
then the following two arrangements are the same, since one is the other viewed from the If. all the objects are unlike, the number, of ways of arranging five bead son a nng 4!, b ut
. 1.s -
4
other side. smce there are two yellows, n(S) = __·_ = 6 2
(2)(2!)
Let E be the event 'the red and the white beads are next to each other'.
r-- red and whitl' ~-an he aHangcd jn 2! W<lf'S

Then n(E) = 2!3! "--------number of \Vd)"i uf <Hr;mging four 1n a nng

c f...
2'2 ' '
<---- ·mt;d O~.(\\·ISe
·I ·· · I C 1OC 1\\VI~C
;-)!1( - <1!'1-<lllf;e!JJCiltS HrC the S<ll11('

3! t·hcrc <He two yellow\


Therefore the number of arrangements of four beads on a ring is 2 = 3.
So n(E) = 3
and P(E) = n(E) = L ]:_
Example 3.37 n(S) 6 2
Six bulbs are planted in a ring and two do not grow. What is the probability that the two that The probability that the red and white beads are next to each oth er 1s
. 12 •
do not grow are next to each other?
This result can be shown diagrammatically:
Solution 3.37 Ways of arranging the beads
LetS be the possibility space, then n(S) = 5!

'0'"0 Q·
Let E be the event 'the bulbs that do not grow are next to each other'.
Consider the two bulbs that do not grow as one item. They can be arranged in 2! ways.
There are now five items to be arranged in a ring and this can be done in 4! ways.

n(E) 2!4!

·O, ·Ow'O·
Therefore =

P(E) = n(E)
So n(S)
2!4!
5!
2
NhOTE:
ot er. as expected, in three of the six arrangements the red and whl'te b ead s are next to each
5
The probability that the bulbs that do not grow are next to each other is~-
214 /\ CC!\CiSt:_ COlH\SF:. If"! i\-i ['/H_ ST.t.TiSTICS

th:l!numher of combinations of r objects frorn n unli.kc ohjcct:;, i_s II \vherc


PERMUTATIONS OF r OBJECTS FROM n OBJECTS
Consider the number of ways of placing three of the letters A, B, C, D, E, F, Gin three empty
spaces.
The first space can be filled in seven ways. The second space can be filled in six ways. The
NOTE: "C, is sometimes written C or
" ' (n)r .
third space can be filled in five ways. Therefore there are 7 x 6 x 5 ways of arranging three
letters taken from seven letters. This is the number of permutations of three objects taken
Example 3.39
7
from seven and it is written P3 •
In how many ways can a hand of fou r card s b e d ealt from an ordinary pack of 52 playing
cards?
So 7 P3 ~7x6x5~210
7x6x5x4x3x2x1
Now 7 x 6 x 5 could be w r i t t e n ---------
4x3x2x1
Solution 3.39
7
p _ 7! _ 7!
1.e. 3 - 4! - (7- 3)! you need to consider combinations ' sinee t h e ord er m
important. . whiCh
. the cards are dealt is not

On a calculator this can be obtained directly: [7] \" P, \ QJ GJ 12


" c. ~ 2 70 725 ~
On a calculator: lc::;] l"c,l 0 1~ 1
(you may have to use the shift key.)
The number of ways of dealing the hand of four cards is 2 70 725.
NOTE: the order in which the letters are arranged is important since ABC is a different
permutation from ACB.
ln gcne1:ae the number of ncrmut<1ti<ms. or ,.d,.,.,.., anmngetl1CII11S, of r objects taken frorn n
11
Example 3.40
unlike is \\·Titten P,. vvhcrc
Four letters are chosen at random from the word RAN .
-n! four letters chosen are consonants. DOMLY. Fmd the probability that all

n! n!
NOTE: Using the formula, "P, ~ - --
(n-n)! 0! Solution 3.40
But the number of ways of arranging n unlike objects is n! Let S be the possibility space, then n(S) ~ s c 4 ~ 70

So 0! is defined to be 1, i.e. Let E= be


n(E) 6 C4 15 'four consonants are c h osen ' . s·mce there are six consonants
the= event

0! cc !
Try it on your calculator. P(E) ~ n(E) ~ _1_.5, ~ 2_
n(S) 70 14
The probability that the four letters chosen are consonants is lw
COMBINATIONS OF r OBJECTS FROM n OBJECTS
When considering the number of combinations of r objects from n objects, the order in which
they are placed is not important. Example 3.41
For example, the one combination ABC gives rise to 3! permutations A team of four is chosen at random from five girls and six boys.
ABC,ACB,BCA,BAC,CAB,CBA (a) In how many ways can the team be chosen if
Denoting the number of combinations of three letters from the seven letters A, B, C, D, E, F, (i) there are no restrictions·
G, by 7 C 3 then (ii) there must be more boy~ than girls?
7 C 3 x3!~ 7 P3 (b) Find the probability that the team contains only one boy.
~ p 3 ~_2!_~35
7
7C
3 3! 3!4!
On the calculator,
7C can be obtained directly:
3
I2J \" C, I [Til ~ I (You may have to use the shift key.)
Pf\OW\BH_ITY 217
216 fo.. CONCISE COURSE. IN A.-I ['/EL_ ST/~.TIST!CS

(b) Similarly, for the polygon of six sides,


Solution 3.41
the number of diagonals = 6c2 - 6
(a) (i) There are 11 people, from whom four are chosen. The order in which they are chosen 6x5
is not important. =-2--6
11
Number of ways of choosing the team= C4 = 330 =9
If there are no restrictions, the team can be chosen in 330 ways. The number of diagonals for a polygon with six sides is 9.
(ii) If there are to be more boys than girls, then there must be three boys and one girl, or
(c) For a polygon with n sides,
four boys.
6 5 the number of diagonals = n c2 - n
Number of ways of choosing three boys and one girl= C3 x C1 = 100
n(n -1)
On a calculator: [[] jnc,j [I] j = I G] [}] lnc,l [1] I = I 2
n

Number of ways of choosing four boys= C4 = 15


6
n 2 - n-2n
So number of ways of choosing three boys and a girl, or four boys= 100 + 15 = 115. 2
n(n- 3)
.. Number of ways to choose the team with more boys than girls = 115.
2
(b) The possibility spaceS= (all possible teams of four)
3
n(S) = 330. 2 ).
The number of diagonals for a polygon with n sides is n(n-
Let E be the event 'only one boy is chosen'. If one boy is chosen, then three girls must be
chosen,
so n(E) = 6 C1 x 5 C3 = 60 Example 3.43
n(E) 60 2 Three letters are chosen at random from the word BIOLOGY F d th b f 'bl
P(E)=-=-=- selections. . m e num er o possl e
n(S) 330 1l
The probability that the team contains only one boy is fr·
Solution 3.43
You bneedfto flind the total number of selections and, because there are two letters 0 find the
Example 3.42 num er o se ect1ons w1th '
If a diagonal of a polygon is defined to be a line joining any two non-adjacent vertices, how
many diagonals are there in a polygon of (a) five sides, (b) six sides, (c) n sides? no letters 0
one letter 0
two letters 0.
Solution 3.42 Number of selections without the letter 0
5
(a) Number of ways to choose two points from five= C2 = 10 =number of ways to choose three letters from B I L G y
5' - 5C ' ' ' '
- 3
Note 5 C2 = -·-
2!3! = 10
= -5x4
- <----- J! cancels on the top and un the bnnum Number of selections with one letter 0 (e ·g · 0 , B, I , 0 , B, G , an d so on )
2 =number of ways to choose two letters from B I L G y
= sc2 '' ' '
= 10
Number of selections with two letters 0 (e.g. 0 , 0 , B , 0 , 0 , G , and so on)
So there are ten possible lines to draw, but as there are five sides, five of these are joining
=number of ways to choose one letter from B I L G y
adjacent vertices. =5 ' ' ' '
5x4
5
. . number of diagonals = C 2 - 5 = - - - 5 = 5.
2 Therefore, total number of selections = 10 + 10 + 5 = 25
The number of diagonals for a polygon with five sides is 5.
Exercise 3f Arrangements, permutations, combinations
NOTE: it is easy to write them all out to check: 1. In how many ways can the letters of the word 12. In a group of six students, four are female and
FACETIOUS be arranged in a line? two are male. Determine how many committees
0, B,I O,O,B What is the probability that an arrangement of three members can be formed containing one
B, I, L begins with F and ends with S? male and two females. (L)
O,B,L 0, O,I
B, I, G
O,B,G O,O,L 2. (a) In how many ways can seven people sit at a 13. Four persons are chosen at random from a group
B, I, Y round table? of ten persons consisting of four men and six
O,B,Y O,O,G
B,L,G (b) What is the probability that a husband and women. Three of the women are sisters.
O,I,L O,O,Y wife sit together? Calculate the probabilities that the four persons
B,L,Y chosen will:
O,I, G 5 3. On a shelf there are four mathematics books and
B,G,Y (a) consist of four women,
O,I, Y eight English books.
I, L, G (b) consist of two women and two men,
(a) If the books are to be arranged so that the {c) include the three sisters. (NEAB)
I, L, Y O,L,G mathematics books are together, in how
I, G, Y O,L,Y many ways can this be done? 14. A touring party of 20 cricketers consists of nine
(b) What is the probability that all the batsmen, eight bowlers and three wicket keepers.
L,G,Y O,G,Y mathematics books will not be together? A team of 11 players must have at least five
10 10 batsmen, four bowlers and one wicket keeper.
4. The letters of the word PROBABILITY are How many different teams can be selected, (a) if
arranged at random. Find the probability that all the players are available for selection, (b) if
the two Is are separated. two batsmen and one bowler are injured and
cannot play?
5. If the letters in the word ABSTEMIOUS are
arranged at random, find the probability that the 15. Find the number of ways in which ten different
Example 3.44 vowels and consonants appear alternately. books can be shared between a boy and a girl if
C D E F G are thoroughly shuffled and then dealt out face each is to receive an even number of books.
d 1 b 11 d A 'B' ' ' ' ' '
Sevencars,aee 6. Nine children play a party game and hold hands
upwards on a table. in a circle. 16. Four letters are picked from the word
{a) In how many different ways can this be done? BREAKDOWN. What is the probability that
Find the probabilities, giving each as a fraction in its simplest form, that {b) What is the probability that Mary will be there is at least one vowel among the letters?
holding hands with her friends Natalie and
. d ear are the cards labelled A, B, C, in that order, Sarah? 17. Eight people sit in a minibus: four on the sunny
(a) the hrst three car s to app h d l b ll d A B c but in any order, side and four on the shady side. If two people
(b) the first three cards to appear are t e car s a e e ' ' ' F G (NEAB) 7. {a} In how many different ways can the letters want to sit on opposite sides to each other and
(c) the seven cards appear in their origmal order: A, B, C, D, E, ' . in the word ARRANGEMENTS be another two people want to sit on the shady side,
arranged? in how many ways can this be done?
(b) Find the probability that an arrangement
chosen at random begins with the letters EE. 18. Disco lights are arranged in a vertical line. How
Solution 3.44 many different arrangements can be made from
8. From a group of ten boys and eight girls, two two green, three blue and four red lights {a) if all
(a)
Number of ways to arrange three letters from seven pupils are chosen at random. Find the nine lights are used, (b) if at least eight lights are
probability that they are both girls. used?
7'
=7p =-".=7x6x5=210
3 4! 9. From a group of six men and eight women, five 19. A group consisting of 10 boys and 11 girls
1 people are chosen at random. Find the attends a course for special games coaching.
P(first three letters are A, B, C in that order)= 210 probability that there are more men chosen than (a) When they are introduced, each person
women. hands a card containing his or her
photograph and name and address to every
Number of ways to choose three letters from seven 10. From a bag containing six white counters and other member of the group. State the total
(b) eight blue counters, four counters are chosen at
number of cards which are exchanged.
7' random. Find the probability that two white (b) 5 boys are selected for basketball and 6 girls
=7C3 = - · =35 counters and two blue counters are chosen.
4!3! for netball. Find the number of different
1 possible selections for each of these.
P(first three letters are A, B, C in any order) = 3s 11. From a group of ten people, four are to be
(c) 5 particular boys and 5 particular girls are
chosen to serve on a committee.
selected and placed in mixed pairs for
{a) In how many different ways can the tennis. Find the total number of different
(c) Number of ways to arrange seven letters= 7! = 5040 committee be chosen? mixed pairs which can be made using these
(b) Among the ten people there is one married 10 children.
1 couple. Find the probability that both the (d) If 4 children are chosen at random from the
. . P(A, B, C, D, E, F, G)= 5040 husband and the wife will be chosen. whole group find the probability that there is
(c) Find the probability that the three youngest a majority of girls in the 4 selected.
people will be chosen. (L Additional)
?f\()E3 . 6.fW ITY 221
220 1~, CONCiSE. COURSE: i~-1 A-L.E.'·./EL STJ\"i :S-i iCS

29. A committee consisting of six persons is to be (ii) Given that the committee consists of
25. How many even numbers can be formed with the selected from five women and six men.
20. A competition has a first prize, a second prize, a digits 3, 4, 5, 6, 7 by using some or all of the three men and three-women and that
third prize and a fourth prize. Ten competitors numbers (repetitions are not allowed)? (a) Calculate the number of ways in which the the men and women must sit alternately
enter this competition and the prizes are awarded chosen committee will contain exactly two round the table, calculate in how many
for the first, second, third and fourth competitors men. different ways they may be seated.
26.
in order of merit.
(a) Find the number of different ways in which
these prizes could be won.
\0000\ (b) Given that the committee is to contain at
~east two men, show that it can be selected
m 456 ways.
(L Additional)

30. A coJ?mittee of eight members consists of one


marned couple together with four other men and
Different coloured pegs, each of which is (c) Given that these 45~ ~ays are equally likely,
Smith and Jones are two of the ten competitors. two other women. From the committee a
painted in one and only one of the six colours calculate the probability that there will be "':orking party of four persons is to be formed.
Find the number of different ways in which the red, white, black, green, blue and yellow, are to more men than women on the committee.
Fm~ the number of different working parties
prizes could be won if be placed in four holes, as shown in the figure, {d) At a meeting the members of the chosen
whtch can be formed.
(b) neither Smith nor Jones wins a prize, with one peg in each hole. Pegs of the same c_ommittee sit at a rectangular table in the
(c) each of Smith and Jones wins a prize. (C) colour are indistinguishable. Calculate how fixed seats illustrated in the diagram: Find also the number if the working party
many different arrangements of pegs placed in (a) may not contain both the husband and his
21. The number of applicants for a job is 15. the four holes so that they are all occupied can D D wife,
Calculate the number of different ways in which be made from (b) must contain two men and two women
six applicants can be selected for interview. (a) six pegs, all of different colours, (c) must contain at least one man and at le~st
The six selected applicants are interviewed on a (b) two red and two white pegs, one woman.
particular day. Calculate the number of ways in (c) two red, one white and one black peg, The eight committee members sit round an
which the order of the six interviews can be (d) twelve pegs, two of each colour.
(L Additional) D D octag_onal table? their positions being decided by
arranged. drawmg lots. Fmd the probability of
Of the six applicants interviewed, three have (i) Given that each may sit in any of the
backgrounds in business, two have backgrounds 27. (a) Calculate how many different numbers six places, calculate the number of (d) the man sitting next to his wife
in education and one has a background in altogether can be formed by taking one, different ways they may be seated at the (e) the man sitting opposite to his ;_,ife,
recreation. Calculate the number of ways in two, three and four digits from the digits table. (£) the three women sitting together. (AEB)
which the order of the six interviews can be 9, 8, 3 and 2, repetitions not being
arranged, when applicants having the same allowed.
background are interviewed successively. (C) (b) Calculate how many of the numbers in part
(a) are odd and greater than 800.
(c) If one of the numbers in part (a) is chosen at Summary
22. Each of seven children, in turn, throws a ball random, calculate the probability that it will
once at a target. Calculate the number of ways
the children can be arranged in order to take the be greater than 300. (L Additional) " Experimental probability
throws.
Given that three of the children are girls and four
28. The positions of nine trees which are to be
planted along the sides of a road, five on the
P(A) ~ lim
n-->o->
(!._)
n
r
where ;; is the relative frequency of A.
are boys, calculate the number of ways the north side and four on the south side, are shown
children can be arranged in order that in the figure. Equally likely outcomes
(a) successive throws are made by boys and 0 0 0 0 N
P(A) ~ n(A)
0
girls alternately, where n(A) is the number of outcomes in A
(b) a girl takes the first throw and a boy takes n(S)
the last throw. (C) n(S) is the number of possible outcomes.
23. To enter a cereal competition, competitors have s 0 <:P(A) (1 If A is impossible, P(A) ~ 0
to choose the eight most important features of a 0 0 0 0
new car, from a possible 12 features, then list the If A is certain, P(A) ~ 1.
eight in order of preference. Each cereal packet (a) Find the number of ways in which this can
entry form contains space for five entries. A be done if the trees are all of different P(A') ~ 1 - P(A) where A' is the event 'A does not occur.'
correct entry wins a new car. species.
(b) If the trees in (a) are planted at random, find For events A and B
(a) What is the probability that a woman wins a the probability that two particular trees are
new car if she completes the entry form next to each other on the same side of the P(A or B)~ P(A) + P(B)- P(A and B)
from one packet? road.
(b) How many entry forms would she need to (c) If there are three cupressus, four prunus and P(A u B)~ P(A) + P(B)- P(A n B)
complete, each entry showing different two magnolias, find the number of different
arrangements, if the probability that she ways in which these could be planted For mutually exclusive events A and B, P(A n B)~ 0
wins a car is to be at least 0.8? assuming that trees of the same species are so P(A or B)~ P(A) + P(B) 'or' rule for exclusive events
identical.
24. Three letters are selected at random from the (d) If the trees in (c) are planted at random, find i.e. P(A u B)~ P(A) + P(B)
word SCHOOL. Find the probability that the the probability that the two magnolias are
selection (a) does not contain the letter 0, on the opposite sides of the road. For exhaustive events A and B
(b) contains both the letters 0. (L Additional)
P(A or B)~ 1, i.e. P(A u B)~ 1
222 /-\ CCJ\iCISE-: C()IJf\S[ !!< ,6.,1_ [\/FI_ :;T.tfi~::r!C>,
T 0
f\CElAl3iUTY 223

I
Miscellaneous worked examples
" Conditional probability
P(A and B) Example 3.45
P(A given B) P(B) A die is biased so that, when it is rolled, the probability of obtaining a score of 6 is 1. The
probabilities of obtaining each of the other five scores 1, 2, 3, 4, 5 are all equal. Cal~ulate the
P(A n B)
i.e. P(A IB) = P(B) probability of obtaining a score of five with this biased die.
(a) The biased die and au unbiased die are now rolled together. Calculate the probability that
P(A and B)= P(A IB)P(B) = P(B I A)P(A).
the total score is 11 or more.
(b) The two dice are rolled again. Given that the total score is 11 or more, calculate the
" For independent events A, B
probability that the score on the biased die is 6. (C)
P(A IB)= P(A)
P(B IA) = P(B) Solution 3.45
P(A and B) = P(A) x P(B) 'and' rule for independent events
Events
® Tree diagrams (Multiply along the branches) 6B: score 6 on biased die 6u: score 6 on unbiased die
P(A n B)= P(A) x P(B I A) 5B: score 5 on biased die 5u: score 5 on unbiased die
For the biased die, P(6B) =!
:. P(scoreis1,2,3,4or5)=i
1 3 3
P(A n B') = P(A) x P(B' I A) P(5B)=-X-=-
5 4 20
(a) P(11 ormore)=P(6B6u)*+P(6B5u)*+P(5B6u)
1 1 1 1 3 1
==-X- + -X- + -x-
P(A' n B)= P(A'l x P(B IA') 4 6 4 6 20 6

=~[~+~+ ;o]
1 13
P(A' n B'l = P(A'l x P(B' 1 A') ::::-X-
6 20
P(B) =P(A n B)+ P(A' n B) 13
o Arrangements, permutations and combinations 120
- The number of ways of arranging n unlike objects in a line n! . · ) P(6B and score is 11 or more)
(b) P(6 B Iscore 1s 11 or more ==
- The number of ways of arranging in a line n objects of which P of P(score is 11 or more) ------marked '' ahnve
n!
one type are alike, q of another type are alike, r of a third type are 1 1 1 1 1
p!q!r! ... -x-+-x-
4 6 4 6 12 10
alike, and so on
13 13 13
- The number of ways of arranging n unlike objects in a ring when (n- 1)!
clockwise and anticlockwise arrangements are different 120 120

- The number of ways of arranging n unlike objects in a ring when (n-1)!


clockwise and anticlockwise arrangements are the same 2 Example 3.46
n! During 1996 a vet saw 125 dogs, each suspected of having a particular disease. Of tbe
- The number of permutations of r objects taken from n unlike np == - - -
' (n-r)! 125 dogs, 60 were female of whom 25 actually had the disease and 35 did not. Only 20 of
objects the males had the disease, tbe rest did not. The case history of each dog was documented
n!
The number of combinations of r objects takeu from n unlike "C = ~---:-: on a separate record card.
' r!(n- r)!
objects
224 . £:.. CCJNC!SE COURSE i~i .1\-l_F:V'EL_ ST.G.TiSTiCS

(a) A record card from 1996 is selected at random. Let A represent the event that the dog
T
! Solution 3.4 7
(a) The first post can be allocated in 8 possible ways.
referred to on the record card was female and B represent the event that the dog referred
The second post can be allocated in 7 possible ways.
to was suffering from the disease.
The th1rd post can be allocated in 6 possible ways.
Find Number of allocations~ 8 x 7 x 6 ~ 336
(i) P(A), (b) Number of different sets of three officers~ 8 C3 ~56
(ii) P(A u B), (c) If both the Browns are chosen
(iii) P(A n B), number of ways to choose thi;d representative= 6
(iv) P(A I B). 6 3
So P(both Browns are chosen) ~- ~-
(b) if three different record cards are selected at random, without replacement, find the 56 28
probability that
(i) all three record cards relate to dogs witb the disease,
(ii) exactly one of the three record cards relates to a dog with the disease, Example 3.48
(iii) one record card relates to a female dog with the disease, one to a male dog with the
disease and one to a female dog not suffering from the disease. (L)
A factory has three machines A, B, C producing large numbers of a certain item. Of the total
da1ly produ~tlon of the item, 50% are produced on A, 30% on Band 20% on C. Records
show ~hat 2 Yo of 1tems produced on A are defective, 3% of items produced on B are defective
Solution 3.46 ~nd 4 Yo of Items produced on C are defective. The occurrence of a defective item is
Summarising the information in a table: mdependent of all other items.
Diseased (B) Not Diseased (B') Total One item is chosen at random from a day's total output.
25 35 60 (a) Show that the probability of its being defective is 0.027.
Female (A)
20 45 65 (b) Given that it is defective, find the probability that it was produced on machine A. (W)
Male (A')
45 80 125
Total
Solution 3.48
(a) (i) P(A) ~ f2~ ~ 0.48 Events are defined as follows
25 + 35 +20 80
(ii) P(A U B)~ ~ -~ 0.64 A: Item produced on A P(A) ~ 0.5 P(D, given A)~ 0.02
125 125
B: Item produced on B P(B) ~ 0.3 P(D, given B) ~ 0.03
(iii) P(A n B) ~ ffs ~ 0.2
C: Item produced on C P(C) ~ 0.2 P(D, given C)~ 0.04
(iv) P(A I B) ~ if~~
D: Item is defective
(b) (i) P(BBB) ~ ;i'5 x 1\~ x ;i'3 ~ 0.045 (2 s.f.)
(ii) Number of ways of arranging B, B', B' ~ 3
So P(BB'B' in any order)~ 3 x ;i'5 x cjl£, x Zi.3 ~ 0.44 (2 s.f.) ()()~0 P(D n A) ~ 0.02 x o.s ~ 0.01 *
Number of ways of arranging the cards~ 3!

A~D'
(iii)
So P(female with disease, male with disease, female without disease)
~ 3! x fl5 x 12~4 x {;,", ~ 0.055 (2 s.f.)
').b

G~D P(D n B)~ 0.03 x 0.3 ~ 0.009


Example 3.47
A company needs to appoint three representatives, one to be based in Lancashire, one in
Yorkshire and one in Cumbria. There are eight sales officers available for selection to the post
of representative.
B~D'
(a) Calculate the number of possible allocations of officers to representative posts. P(D n C)~ 0.04 x 0.2 ~ o.oos
··~D
(b) Calculate the number of different sets of three officers who could be appointed to
represent the company.
(c) Two of the eight sales officers are members of the Brown family. Assuming that the three
representatives are chosen at random from the eight officers, find the probability that both
c< • :;--.....______ D'
members of the Brown family will be chosen. Give your answer as a fraction in its Machine Defective Items
simplest form. (NEAB)
I Pf\C)H/U:!_i'i"{ 227
226 .4 CONCiSE: COURS[ IN ~\--l.E"/El_ STi\TiSllCS

I
i
(ii) P(Bird not caught by Albert)= P(B n A')
(a) P(D) = P(D and A)+ P(D and B)+ P(D and C) = P(B) X P(A )
1
(~ ---- i1 :-11\d /\' a!·c: iiHlC'(F'thkll~
= 0.01 + 0.009 + 0.008 = 0.3 X 0.8
= 0.027 = 0.24
(b) You already know that P(D given A)= 0.02, but now you need to 'reverse the conditions' Before answering the next parts, it is useful to show all the given information on a tree diagram:
to find P(A given D)
P(A and D) f--·- --- ·rnrkeci on trc\ A~M P(MnA)=0.1 '''"'P"'''"i(<i
Use P(A given D)
P(D)
0.01 ~( v
o~B
0.027 !-~------- fmtml in p<tn (;t}
(),'
= 0.370 (3 d.p.). P(M n L) = 0.15)

Example 3.49
A house is infested with mice and to combat this the householder acquired four cats, Albert,
P(M n K) = 0.05
Belinda, Khalid and Poon. The householder observes that only half of the creatures caught are ~M
mice. A fifth are voles and the rest are birds.
K~V
20% of the catches are made by Albert, 45% by Belinda, 10% by Khalid and 25% by Poon. B
(a) The probability of a catch being a mouse, a bird or a vole is independent of whether or
not it is made by Albert. What is the probability of a randomly selected catch being a P(MnN)=0.2

(i) mouse caught by Albert,


(ii) bird not caught by Albert?
(b) Belinda's catches are equally likely to be a mouse, a bird or a vole. What is the probability
(b) P(Mouse caught by Belinda)= P(M n L)
of a randomly selected catch being a mouse caught by Belinda?
= 0.45 X t
(c) The probability of a randomly selected catch being a mouse caught by Khalid is 0.05.
= 0.15
What is the probability that a catch made by Khalid is a mouse?
(d) Given that the probability that a randomly selected catch is a mouse caught by Poon is 0.2 (c) P(Catch is mouse caught by Khalid) = P(M n K) = 0.05
verify that the probability of a randomly selected catch being a mouse is 0.5.
P( Catch by Khalid is a mouse) = P(M I K)
(e) What is the probability that a catch which is a mouse was made by Belinda? (AEB)
P(MnK)
P(K)
Solution 3.49 0.05
Events Probabilities 0.1
M: a mouse is caught P(M) =0.5 =0.5
V: a vole is caught P(V) = 0.2
(d) P(Catch is mouse caught by Poon) = 0.2
B: a bird is caught P(B) = 1 - (0.5 + 0.2) = 0.3
P(Catch is a mouse)= P(M n A)+ P(M n L) + P(M n K) + P(M n N)
A: Catch by Albert P(A) = 0.2 = 0.1 + 0.15 + 0.05 + 0.5
L: Catch by Belinda P(L) = 0.45 =0.5
K: Catch by Khalid P(K) = 0.1 (e) P(Catch which is a mouse was caught by Belinda)= P(L IM)
N: Catch by Poon P(N) = 0.25 P(LnM)
(a) (i) P(Mouse caught by Albert) = P(M n A) P(M)
= P(M) x P(A) <- · M "'"i A ace 0.15 hen! p:t_n (b}
= 0.5 X 0.2
0.5
= 0.1
=0.3
}'f"
I Dfi:CJL1LJjlliTY 229
!

The writer submits a different poem for each of {i) Calculate the probability that a
three separate issues of the magazine. Given that randomly chosen playing unit is
Miscellaneous exercise 3g the probabilities remain the same, calculate the rejected.
5. Forty 17- and 18-year old students are the only probability that all three of her poems are (ii) Given that a playing unit is accepted,
1. Each time a table tennis player serves, the
people present at a party. The numbe~s of .male accepted. (C) calculate the probability that a fault was
probability that she wins the point is 0.6, found on the first test. Give your answer
and female students of each age are g1ven m the
independently of the result of any preceding correct to three significant figures.
following table. 8. At an art exhibition seven paintings are to be
serves. At the start of a particular game, she
hung in a row along one wall. Find the number (b) The probability of a randomly chosen
serves for each of the first five points. Calculate
the probability that, for the first two points of
this game,
___ 17-year
_:__ old
18-year old of possible arrangements.
Given that three paintings are by the same artist,
headphone unit being found to be faulty on
the first test is 0.04. If a second test is
Male 9 13 find the number of arrangements in which needed, the probability of a headphone unit
(a) she wins both points, 11 being found to be faulty on the second test is
(b) she wins exactly one of these two points. Female 7 (a) these three paintings are hung side by side,
0.02. Calculate the probability that a
(b) any one of these three paintings is hung at
randomly chosen headphone unit is
Calculate the probability that, for the first five In the Grand Draw, each of the forty students the beginning of the row but neither of the
accepted. Give your answer correct to three
points of this game, has an equal chance of winning one of two other two is hung at the end of the row. (C)
significant figures.
(c) she loses all five points, prizes. The first prize is a gift token and the (c) A randomly chosen playing unit that has
second prize is a box of chocolates. No student 9. A group of three pregnant women attend
(d) she wins at least one of these five points. (C) been accepted and a randomly chosen
may win more than one prize. Find the ante-natal classes together. Assuming that each
headphone unit that has been accepted are
probability that woman is equally likely to give birth on each of
2. A director of a company is selected at random. combined to make a personal stereo system.
the seven days in a week, find the probability
C denotes the event that the director's annual (a) the gift token will be won by an 18-year old Calculate the probability that at least one of
that all three give birth
salary is more than £300 000. male student, the two units has been retested. Give your
C' denotes the event that the director's annual {b) both prizes will be won by female students, (a) on a Monday, answer correct to three significant figures.
(c) the box of chocolates will be won by a (b) on the same day of the week, (C)
salary is not more than £300 000.
D denotes the event that the director's annual J 7-year old student, given that the gift token (c) on different days of the week,
is won by a 17-year old male student. {C) (d) at a weekend (either a Saturday or Sunday). 12. A bag contains four red counters, three blue
salary is less than £200 000.
(e) Find the probability of all three giving birth counters and three green counters. A counter is
E denotes the event that the director's annual
6. Each customer at a supermarket pays by one of on the same day of the week given that they drawn at random from the bag and not replaced.
salary is less than £350 000.
cash, cheque or credit card. The probability of a all give birth at a weekend. A second counter is then drawn at random from
Write down two of the events C, C', D and E randomly selected customer paying by cash is (f) How large would the group need to be to the bag.
which are 0.54 and by cheque is 0.18. make the probability of all the women in the Assuming that at each stage each counter left in
(a) complementary, group giving birth on different days of the the bag has an equal chance of being drawn,
(a) Determine the probability of a randomly
(b) mutually exclusive but not exhaustive, week less than 0.05? (AEB)
selected customer paying by credit card. (a) find the probability, giving your answers as
(c) exhaustive but not mutually exclusive. fractions in their lowest terms, that the
(AEB) Three customers are selected at random. 10. The probability that for any married couple the
husband has a degree is fa and the probability second counter will be blue given that
(b) Find the probability of
3. Newborn babies are routinely screened for a that the wife has a degree is!. The probability (i) the first counter is red,
serious disease which affects only two per 1000 (i) all three paying by cash, that the husband has a degree, given that the (ii) the first counter is blue,
babies. The result of screening can be positive or (ii) exactly one paying by cheque, wife has a degree, is tf. (iii) the first counter is green.
negative. A positive result suggests that the baby (iii) one paying by cash, one by cheque and
A married couple is chosen at random. (b) Find the probability, giving your answer as a
has the disease, but the test is not perfect. If a one by credit card.
Find the probability that fraction in its lowest terms, that the first
baby has the disease, the probability that the counter will be red and the second counter
result will be negative is 0.01. If the baby does The probability that the amount payable exceeds (a) both of them have degrees,
£30 is 0.26. If the amount payable does exceed will be blue.
not have the disease, the probability that the (b) only one of them has a degree, (c) Find the probability, giving your answer as a
£30, then the probability of it being paid by (c) neither of them has a degree.
result will be positive is 0.02. fraction in its lowest terms, that the second
cheque is 0.28.
(a) Find the probability that a baby has the Two married couples are chosen at random. counter will be blue regardless of the colour
disease, given that the result of the test is {c) Find the probability that a randomly of the first counter. (C)
selected customer pays more than £30 and (d) Find the probability that only one of the two
positive. husbands and only one of the two wives
(b) Comment on the value you obtain. (L) pays by cheque. 13. A particular firm has six vacancies to fill from
(d) Hence find the probability that a randomly have a degree. {L) 15 applicants. Calculate the number of ways in
4. A penalty shoot-out in a game of hockey requires selected customer pays more than £30, given which these vacancies could be filled if there are
that the customer pays by cheque. (AEB) 11. A personal stereo system consists of a playing
each of two players to take a pena!ty hino try to no restrictions.
unit and a headphone unit. Each unit is tested for
score a goal. In a simple model, each player has a faults. If a unit is found to be faulty, an attempt The firm decides that three of the six vacancies
7. A writer submits a poem for publication by a shall be filled by women and three by men. The
probability of 0.8 of scoring a goal, and is made to correct the fault and the unit is then
independence is assumed. Calculate the probability literary magazine. The poem will be accepted for
retested. Any unit that is found to be faulty a applicants consist of seven women and eight men.
that exactly one goal is scored from the two hits. publication if it is approved by at least two of the Calculate the number of ways in which the six
three members of the editorial staff who second time is rejected.
In an alternative model, the probability of the vacancies could be filled under these conditions.
independently assess it. Given that the (a) The probability of a randomly chosen
second player scoring is reduced to 0. 7 if the first probabilities that the poem is approved by the playing unit being found to be faulty on the One of the seven women is the wife of one of the
player does not score. Calculate the probability three members are 0.9, 0.7 and 0.6 respectively, first test is 0.1. If a second test is needed, the eight men. Calculate the number of ways in
that the second player has scored, given that only find the probability that the poem is not probability of a playing unit being found to which three women and three men could fill the
one goal is scored. (C) accepted. be faulty on the second test is 0.05. six vacancies, given that both the wife and her
husband are among those appointed.
230 ,r:., CONCiSE: CGUF\S!:: !f'\ ,0..-l.E'v'El STi~.Ti~:JTICS

(c) Find the probability that Mixed test 3A


Of all the possible selections of three women and
three men, one is picked at random. Calculate (i) players 1 and 2 play each other, 4. Last year the employees of a firm either received
1. A club social committee consists of eight people,
(ii) players 1 and 2 do not play each other. no pay rise, a small pay rise or a large pay rise.
the probability that this selection includes two of whom are Nicky and Sam. Two of the
In fact, players 1, 2, 3 and 4 are girls and the rest eight committee members are to be chosen at The following table shows the number in each
(a) both the wife and her husband,
(b) either the wife or her husband, but not both. are boys. random to organise the next club disco. category, classified by whether they were weekly
(C) (d) In how many ways can the counters be . paid or monthly paid.
drawn such that the girls play each other m First Choice Second Choice
14. Laura has 12 friends, seven girls and five boys, matches A and B and the boys play each No Small Large
---------------Sam
all of whom she wants to come to her birthday other in matches C and D? pay rise pay rise pay rise
party. However, she is only allowed to invite five (e) What is the probability that no girl plays a Nicky-----------
of them. Not wishing to show any favouritism, boy in the quarter-finals? (MEI) Other Weekly paid 25 85 5
Laura chooses the five children to come to the
Monthly paid 4 8 23
party at random. 16. In a set of 28 dominoes each domino has from 0 -------------- N;cky
to 6 spots at each end. Each domino is different
(a) How many different selections are possible?
from every other and the ends are <E------- Sam --------------- A tax inspector decides to investigate the tax

....
(b) In how many selections are there exactly indistinguishable so that, for example, the t~o Other affairs of an employee selected at random.
three boys? diagrams in figure 1 represent the same dommo.
(c) What is the probability that exactly three N;cky D is the event that a weekly paid employee is
boys are invited to the party? selected.
other Sam
In fact, there are three girls at the party, E is the event that an employee who received no
Other pay rise is selected.
including Laura, and three boys, including Liam Fig. 1
and John. For the party tea they sit round a A domino which has the same number of spots By considering the above tree diagram, or D' and E' are the events "not D" and "notE"
circular table, equally spaced, with Laura sitting otherwise, respectively.
at each end, or no spots at all, is called a
in the position shown in the diagram. 'double'. A domino is drawn at random from the (a) find the probability that both Nicky and Find
0 0 set. Figure 2 shows a sample space diagram to Sam are chosen, (a) P(D),
represent the complete set of outcomes, each of (b) find the probability that both Nicky and (b) P(D u E),
which is equally likely. Sam are chosen, given that at least one of (c) P(D' n E').
A Nicky and Sam is chosen. (C)
0 0 6 6 F is the event that an employee is female.
Laura 5 5
4
2. A bag contains ten balls, of which four are red
4 and six are blue. An experiment consists of (d) Given that P(F') = 0.8, find the number of
3 3
2 drawing at random and without replacement female employees.
0 0 2
(e) Interpret P(D \F) in the context of this
1 three balls, one at a time, from the bag.
(d) In how many different ways can the other 0 question.
2 3 4 5 6 (a) Draw a tree diagram to show all the possible
children fill the remaining seats? 0 2 3 4 5 6 (f) Given that P(D n F)= 0.1, find P(D IF).
Fig. 3 outcomes of the experiment. (AEB)
With Laura sitting in her place, the other Fig. 2
Hence, or otherwise, find the probability that
children take their seats at random. Let the event A be 'the domino is a double', 5. The captain of a darts team is trying to arrange
(e) Find the probability that Laura sits next to event B 'the total number of spots on the domino (b) the first two balls drawn will be of different
an evening match for next Monday, Tuesday,
is six' and event C be 'at least one end of the colours,
Liam and John. Wednesday or Thursday. He hopes that the
(f) Find the probability that boys and girls sit domino has five spots'. (c) the third ball will be red,
leading players, A, B, C and D, will all be free on
alternately. (MEl) (d) the third ball will be red, given that the first
Figure 3 shows the sample space with the event one of these evenings. In fact each of the four
tvvo balls drawn were both blue. (L)
A marked. players has arranged an engagement for exactly
15. A draw is being made for the quarter-finals of a one of the four evenings.
{a) Write down the probability that event A 3. Ann, Barry and Clare are three students taking a
knock-out table tennis tournament. Eight
counters, alike in every respect except that they occurs. multiple choice examination paper. For each Assuming that each player is equally likely to
are numbered from one to eight inclusive, are (b) Find the probability that either B or Cor question a student has to select the correct have chosen any one of the four evenings, and
placed in a bag and drawn one by one, without both occur. answer from five that are offered. For that their choices are independent, find the
(c) Determine whether or not events A and B Question 1, Ann has no idea of the correct probability that
replacement. A typical draw might produce the
numbers in the order 3, 5, 7, 2, 1, 8, 6, 4, are independent. answer, Barry correctly identifies one answer (a) ~and B have both chosen Monday evening,
(d) Find the conditional probability P(A I C). that is wrong and Clare correctly identifies two (b) etther CorD {or both) has chosen Monday
resulting in the matches:
Explain why events A and Care not wrong answers. All three students decide to guess evening,
Match A 3 plays 5 independent. at random from the answers they think stand a (c) the four players have chosen four different
Match B 7 plays 2 After the first domino has been drawn, a second chance of being correct. Calculate the probability evenings,
Match C 1 plays 8 domino is chosen at random from the remainder. that (d) there will be at least one evening when all
Match D 6 plays 4 four players are free. (NEAB)
(e) Find the probability that at least one end of (a) none of the three students chooses the
(a) In how many different orders can the the first domino has the same number of correct answer,
counters be drawn from the bag? spots as at least one end of the second (b) Clare is the only one to choose the correct
(b) In how many ways can the counters be answer,
domino.
drawn such that (c) exactly one of the three students chooses the
[HINT: Consider separately the cases ~h~re
(i) players 1 and 2 play each other in match the first domino is a double and where tt ts correct answer. (NEAB)
A, not.l (MEII
{ii) players 1 and 2 play each other.
232 /-\ CONCiS[ COUf~SE li'J /1.-LE_I.!EL ST/\TiSTiCS

Mixed test 38 · A BandC.On


4 A school has three p h otocopters 'b b"lities of a
A coin is biased so that, on eac~ toss, the . . . any given day the independent pro a l d 0 04
1. probability of obtaining a head ts 0.4. The com ts breakdown are 0.1 for A, 0.05 forB an .
tossed twice. 1 for C.
(a) Calculate the probability that at east one
For a randomly chosen day, calculate the
head is obtained. ..
(b) Calculate the conditional prob~bthty that probability that
exactly one head is obtained, gtven that at( C) t least one of the copiers breaks down,
least one head is obtained. ~~)) :xactly one of the copiers breaks down, h
(c) gi~en ex_actly one copier breaks dow(NtE;;B}
2. The probabilities of events A and B are P(A) and it JS copter C.
P(B) respectively. Probability distributions I - discrete variables
P(A) ~ f,, P(A n B) ~ t, P(A U B)~ q. Every year two teams, the Ramble;s a?d the
5. Strollers meet each other for a qmz mgh~. From
Find, in terms of q, past rest; its it seems that in years when t _e .
(a) P(B), Ramblers win, the prob~bility of them wmnmg In this chapter you will learn
(b) PIAIB).
the next year is 0. 7 and m years when t~e .
Strollers win, the probability of t~em wm~ng .
Given that A and B are independent events, the next year is 0.5. It is _not possible for t e qmz
(c) find the value of q ·
(L)
to result in the scores bemg tled. ® about probability distributions for discrete random variables
The Ramblers won the quiz in 1996. ® how to calculate and use £(X), the expectation (mean)
3 . A questionnaire asks shar~hdoldehrs ohf •,::J:ny
to state whether they const er t e c at (a) Draw a probability tree diagram for the ® how to calculate and use £(g(X)), the expectation of a simple function of X
salary to be too high, about right, or too low. three years up to 1999. .
(b) Find the probability that the Strollers Wl11 ® how to calculate and use Var(X), the variance of X
Excluding shareholders who have no opinion,
the probabilities of answers from a randomly win in 1999. · h ® about the cumulative distribution function F(x)
(c) If the Strollers win in 1999, wh~t ts t _e
selected shareholder are as follows: probability that it will be their ftrst wm for ® about the results relating to expectation algebra for random variables X and Y
Too high 0.95 at least three years? . .
d) Assuming that the Strollers wm m 1999,
find the smallest value of n su~h t~at the
About right 0.03
Too low 0.02 ( This chapter is concerned with discrete variables. When a variable is discrete, it is possible to
probability of the Ramblers wmnmg th: specify or describe all its possible numerical values, for example
What is the probability that if three shareholders quiz for n consecutive years after 1999(~EI)
are selected at random, less than 5% · " the number of females in a group of four studeuts: the possible values are 0, 1, 2, 3, 4,
(a) they will all answer 'too ,high',_ , ED the amount gained, in pence, in a game where the entry fee is lOp and the prizes are SOp
(b) exactly two will answer too htgh,
and £1: the possible values are 10, 40, 90,
(c) exactly two will give the,same _an~w~r,
(d) exactly two will answer too htgh gtven e the number of times you throw a die until a six appears: the possible values are 1, 2, 3, 4,
that exactly two give the same answer?(AEB) 5, ... , to infinity.

PROBABILITY DISTRIBUTIONS
A probability distribution gives the probability of each possible value of the variable.
Consider this situation:
By mistake, three faulty fuses are put into a box containing two good fuses. The faulty and
good fuses become mixed up and are indistinguishable by sight. You take two fuses from the
box. What is the probability that you take
(a) no faulty fuses,
(b) one faulty fuse,
(c) two faulty fuses.
where the score is the sum of the two b .
density function (p.d.f.) of X, where ;~~hers ~ which the dice land. Find the probability
0
It is possible to show the outcomes and probabilities on a tree diagram: thrown' · e ran om vanable 'the score when t wo d'ICe are
Probability Outcome
P(F,F)=ix~=0.3 2 faulty fuses
Solution 4.1
The score for veach possible outcome is shown in the poss1'b'l'
1 1ty space:
:.a 4 ') 6 8
i
P(F,F') = X~= 0.3 1 faulty fuse
"""u
0 3 4 5 ()

1 faulty fuse v From the diagram you can see that X


P(F',F)=~X~=0.3 "' 2 J 4 5 0
f~F can take the values 2 3 4 5 6 7 8
1 2 3 4 on Iy. ' ' ' ' ' '

F'~F'
5

1 2 3 4
0 faulty fuses First die
P(F',F') =~X~= 0.1
Since each outcome is e qua 11y n
Ice1y, the probabilities can be found from th d'
(a) P(no faulty fuses)= 0.1 For example P(X = ) _ _±_ • e tagram.
(b) P(one faulty fuse)= 0.3 + 0.3 = 0.6 , 5 - I6 smce 4 out of the possibl 16
(c) P(two faulty fuses)= 0.3 The probability distribution is formed: e outcomes result in a score of 5.
The variable being considered here is 'the number of faulty fuses' and it can be denoted by X.
The values that X can take are 0, 1 or 2. X 2 3 4 5 6 7 8
The probability that there are no faulty fuses, i.e. the probability that the variable X takes the
1 3 'l 2 1
value 0, can be written P(X = 0), so P(X = 0) = 0.1. 16 T6 16 -& T6 16

Similarly P(X = 1) = 0.6 and P(X = 2) = 0.3. Notice the pattern for the probabilities relating to x from 2 to 5.
Sometimes these are written Po= 0.1, p 1 = 0.6, p 2 = 0.3.
When defining variables, the variable is usually denoted by a capital letter (X, Y, R, etc) and a x-1
P(X=x)= _ for x=2 , 3 , 4 , 5
particular value that the variable takes by a small letter (x, y, r, etc), so that P(X = x) means 16
'the probability that the variable X takes the value x'. For x from 6 to 8, there is a different pattern
The probability distribution for X can be summarised in a table and illustrated in a vertical
9-x
line graph. 0.6 PI.X ~ ,, P(X =x) =--u; for x = 6, 7,8
0.5

0~3 I
0.4 These two formulae give the p.d.f. of X.

I
0 1 0.3

:(X=x) 0.1 0.6


0.2
0.1 NOTE: I
all x P(X =X)= ,\;(1 + 2 + 3 +4+ 3 +2 + 1)- . . that X is a ra n d on1 vana
- 1, conflrmmg . bl e.
0
0 0 1 2 ' -=====-=~,·~~=~,=~

If the sum of the probabilities is 1, the variable is said to be random. Example 4.2
In this example P(X = 0) + P(X = 2) = 0.1 + 0.6 + 0.3 = 1, so X is a discrete random variable.
The p.d.f. of a discrete random variabl y. .
is 1, that cis a constant, find the value of c.e IS given by P(Y = y) = cy', for y = 0, 1, 2, 3, 4. Given
for a discrete nllnd•nm ., ... : ·'·"'
or
Solution 4.2
The function that is responsible for allocating probabilities, P(X = x), is known as the probability y 0 1 2 3 4
It helps to write out the
density function of X, sometimes abbreviated to the p.d.f. of X. The probability density function
probability distribution of Y. P(Y=y) 0 c 4c 9c 16c
can either list the probabilities individually or summarise them in a formula.

Example 4.1
Since Y is a rand om vanable
. I
' '"Y P(Y = Y) = 1 ' I.e.
· the sum of all the probabilities is 1.
Two tetrahedral dice, each with faces labelled 1, 2, 3 and 4 are thrown and the score noted,
236 ,t,
,.- ·· · · ·-- . It·:··.•.. c.-.·
CCY\C!St:. CClUFbt: 11\1 F\--U: ·-~
/\1 !STiCS

4. X has probability distribution as shown in the (b) William wins a prize if, at the end of his
table turn, there are two or more tennis balls in
So c + 4c + 9c + 16c = 1 the bucket. What is the probability that
30c= 1 X 1 2 3 4 5 William does not win a prize?
1
c=- 1 3 1 1 10. Emma plays a game in which she throws two
a
10 10 5 20 dice. If she gets two sixes, she wins 20p, if she
gets one six she wins 10p, otherwise she wins
(a} Find the value of a. nothing. She has to pay 5p to enter.
(b) FindP(X~4).
Example 4.3 (c) Find P(X < 1). Write out the probability distribution of X, the
(d) Find P(2 <:X< 4). amount Emma gains in one turn.
The discrete random variable W has probability distribution as shown.
3 -2 -1 0 1 5. Write out the probability distribution for each of 11. A student has a fair coin and two six-sided dice,
w these variables. one of which is white and the other blue. The
0.15 d student tosses the coin and then rolls both dice.
0.25 0.3 (a) The number of heads, X, obtained when
P(W=w) 0.1 Let X be a random variable such that if the coin
two fair coins are tossed.
falls heads, X is the sum of the scores on the two
(b) The number of tails, X, obtained when three
Find fair coins are tossed. dice, otherwise X is the score on the white die
only.
(b) P(-3 <: W < 0) (c) P(W> -1),
(a) the value of d, 6. A drawer contains eight brown socks and four Find the probability function of X in the form of
(d) P(-1 < W < 1), (e) the mode. blue socks. A sock is taken from the drawer at a table of possible values of X and their
random, its colour is noted and it is then associated probabilities.
replaced. This procedure is performed twice
more. X is the random variable the number of Find P(3 <:X<: 7).
Solution 4.3
brown socks taken. Find the probability State the assumption you made to enable you to
distribution for X. evaluate the probability function. (AEB}
(a) Since L P(W = w) = 1 7. The discrete random variable R has p.d.f.
a!! w 12. X can take values 5, 6, 7, 8 and 9. The vertical
P(R = r) = c(3- r) for r = 0, 1, 2, 3. line graph to illustrate the distribution of X is
0.1 + 0.25 + 0.3 + 0.15 + d = 1 incomplete. Given that P(X = 8) = 2P(X = 9),
(a) Find the value of the constant c.
0.8 + d = 1 (b) Draw a vertical line graph to illustrate the complete the line graph and describe the
d= 0.2 distribution. distribution.
(c) Find P(1 <: R < 3).
(b) P(-3,;; W < 0) = P(W = -3) + P(W = -2) + P(W = -1) PIX= X)
0.4
= 0.1 + 0.25 + 0.3 8 Write down the formula for the p.d.f. of X
where X is the numericil value of a digit chosen
=0.65
from a set of random number tables. 0.3
(c) P(W> -1) = P(W = 0) + P(W = 1) 9. A game consists of throwing tennis balls into a
= 0.15 + 0.2 0.2
bucket from a given distance. The probability
= 0.35 that William will get the tennis ball in the
bucket is 0.4. A turn consists of three attempts. 0.1
(d) P(-1<W<l)=P(W=0) (a) Construct the probability distribution for
=0.15 X, the number of tennis balls that land in
the bucket in a turn. 5 6 7 8 9x
(e) The value of w with the highest probability is -1, so the mode= -1.

EXPECTATION OF X, £()0

Exercise 4a Probability distributions E(X) is read as 'E of X' and it gives an average or typical value of X, known as the expected
value or expectation of X. This is comparable with the mean in descriptive statistics.
2. The probability den.sity. function(oXf a d)is~rketefor
1. The discrete random variable X has the given random variable X lS gtven by P =X - x
probability distribution. X= 12, 13, 14. . . df d
4 5
Write out the probability distnbutton an m Experimental approach
1 2 3
X the value of k.
0.4 a 0.05 The frequency distribution shows the results when an unbiased die is thrown 120 times.
0.25
0.2 3 The discrete random variable X can take values
" 3 5 6 8 and 10 only. Given that PJ = 0.1, Score, x 1 2 3 4 5 6
. d the value of a and draw a vertical line
(a) Fm 'b . p:.;, o.'os,
P6 = 0.45 and Ps = 3P1o• calculate Pto·
graph to illustrate the distn utton.
Frequency, f 15 22 23 19 23 18 Total120
(b) Find (i) P(1 <:X<: 3), (ii) P(X > 2 ),
(iii) P(2 <X< 5), (iv) the mode.
Example 4.4
3.6(2 s.f.)

a'"''""'~ 0(\X).
lx15 + 2x22 + 3x23 + 4x19 + 5x23 + 6x18 A random variable X has probability distribution as shown Find th .
The mean score,
'ifx
x ~ If 120
I;,x •I ,: ,: 0 · 1<

You could write this out in a different way 0.15 0.4 0.05 .
- -- 1 X 120
X 15 + 2 X 120
22 + 3 X 23
120 + 4 X 120
19 + 5 X 120
23 + 6 X I2]18

.
e ractwns 15
TI1l>
22
no' 23
no' 19
126,
23
,
18
T:W are t h e re1auve . o f t h e scores o f 1 , 2 , 3 , 4 , 5 , 6
. frequenctes
120
Th f Solution 4.4
respectively.
Notice that they are close to -f~o == *· E(X) ~ 2:: xP(X ~ x)
If you throw the die a large number of times, you would expect each of these fractions to be
closer to J:, the limiting value of the relative frequency of a particular score on the die. : all x
~-o~i X 0.3 + (-1) X 0.1 + 0 X 0.15 + 1 X 0.4 + 2 X 0.05

Theoretical approach Example 4.5


When an unbiased die is thrown, the probability of obtaining a particular value is t. Find the expected number of sixes when three fair dice are thrown.
The probability distribution is P(X ~ x) ~ t for x ~ 1, 2, 3, 4, 5, 6.
4 5 6
1 2 3 Solution 4.5
Score, x I I
1 1 1
6
6 6 X is t~ea n.url_lber
event S1X 1S notof sixes and can take values 0 '1, 2, 3. Using the notation 6 to represent the
obtained',
6 6

The expected mean, or expectation of X, is obtained by multiplying each score by its


P(X ~ 0) ~ 1'(6 ' 6 ' 6) ~ (")' _m
6 -216
probability, then summing. It is written E(X), so: P(X ~ 1) ~ P(6, 6, 6) + P(6 6 6) + P(6 6 6-)
Expected mean, E(X) ~1 X l + 2X l: + 3X ~ + 4Xt + 5X t + 6Xt -- (')2 1 + (~) '2 X*
6 X(;;)
75
' 2
+ (1)' X' 1
6 6
d.5 = 216
The expectation or expected mean can be thought of as the average value when the number of P(X ~ 2) ~ 3 X P(6, 6, 6) ~ 3 X~ X (1)2- _lL

P(X~3) ~P(6 6)~(1)3


6 216

experiments increases indefinitely. 6


' ' 6
_ _1__
- 216
-

In a statistical experiment The probability distribution for X is


" a practical approach results in a frequency distribution and a mean value,
e a theoretical approach results in a probability distribution and an expected value, known X 0 1 2 3
125 75 1
as the expectation. P(X-x) 216 216 216
1S by
E(X) ~I. xP(X ~ x)
X) =Oxft}+l xj},+2xli+3x_1__
216 216

~ 0.5
''fhis can also be written The expected number of sixes when three dice are thrown is 0 5
NOTE· · ·
The symbolp, pronounced 'mew' is often used for the expectation, where
: m 50 throws you would ex ect 25 six
fOOO throws though, you may get !ry close t~sl~~J'racttce{:u
.
may not get 25 sixes. In
ong-term average value. sixes. e expected value gives you the
p=
-"~~--~~,_--,.~~~~
PROB.AbiUTY DISTRif3UT!Of'\S DiSCRi fT \/AR!i\BLES 241
240 f\ CONCISE. COUf\SE IN A-L[\/EL STP\liSTICS

The rules for playing a game on the fruit machine are:


Symmetrical probability distributions
An important property which some distributions possess is that of symmetry, for example 1{'\Jb "\Op to p~
0 /
(a) 3 4 5
X

P(X ~ x)
1

0.1
2

0.2 0.4 0.2 0.1


X 0.4
II

~ 0.3
[j][IJ[j] You win ~1

It can be seen from the table or from the vertical line graph
that the distribution is symmetrical about the central value
0.2
Gf]Gf]Gf] You win 50p
X= 3, so E(X) = 3. 0.1 I I
Check: E(X) = L xP(X = x)
allx
0 .J-..--!-+-1---1-+-
012345x
[QJ[QJ[QJ You win 40p
=] X 0.1 + 2 X 0.2 + 3 X 0.4 + 4 X 0.2 + 5 X Q.l = 3

(b) If X is the random variable 'the digit picked from a random number table', then the p.d.f. [j][I]Gf] You win !lOp
ofXisP(X=x)=O.l forx=O, 1, ... ,9. On o.ny orcler)

6 7 8 9
2. 3 4 5 Find the expected gain or loss if you play a game.
X 0 1
0.1 0.1 0.1 0.1
0.1 0.1 0.1 0.1
0.1 0.1 Solution 4.6
The .var~able X is 'the amount gained, in pence in a game'.

~ oj \IIIIIIII
Takmg mto account the cost of lOp to play, X can take the values 90 70 40 30 -10
' ' ' ' .
P(X = 90) = P(3 apples)= 0.1 x 0.1 x 0.1 = 0.001
P(X = 70) = P(2 apples and one with cherries, in any order)
0123456789x
= P(A, A, C)+ P(A, C, A)+ P(C, A, A)
The distribution is symmetrical about the central value mid-way between 4 and 5 so = 3 X 0.1 2 X 0.2
E(X) =4.5. = 0.006
NOTE: the random variable X with p.d.f. P(X = x) = k, for all possible values of x, where k is P(X = 40) = P(3 cherries)= (0.2) 3 = 0.008
a constant, is said to follow a discrete uniform distribution.
P(X = 30) = P(3lemons) = (0.4) 3 = 0.064
P(X = -10) = P(you win none of these prizes)
Example 4.6 = 1 - (0.001 + 0.006 + 0.008 + 0.064) = 0.921
A fruit machine consists of three windows which operate independe:'tly. Each window shows
The probability distribution for X is
pictures of fruits: lemons, apples, cherries or bananas. The probab1hty that a wmdow shows a
particular fruit is as follows. X 90 70 40 30 10

P(lemon) = 0.4 \ 0 \ P(cherries) = 0.2 \ ~\ P(X-x) 0.001 0.006 0.008 0.064 0.921

E(X) = L:xP(X=x)
allx

P(apple) = 0.1
P(banana) = 0.3 \ ~\ = 90
= -6.46
X 0.001 + 70 X 0.006 + 40 X 0.008 + 30 X 0.064 + (-10) X 0.921

The expected loss per turn is 6.46p.


~~~s £~~~~~ that, if you played the game, say 1000 times, on the average you could expect to
1
242 ,t, CONCiSE COUT-\S!:: \~\ ,~_-l __f_'vTI_ STi\TiSTiCS

Solution 4.8

Example 4.7 When a die is thrown ,


A newsagent stocks 12 copies of a magazine each week. He has regular orders for nine copies, P(1 or 6) = ~ =! so P(neither 1 nor 6) = 1 _ L ~
3-3
and the number of additional copies sold varies from week to week. The newsagent uses
previous sales data to estimate the probability, for each possible total number of copies sold, When three dice are thrown
'
as follows. P(neither 1 nor 6 on all three)= (~) 3 = d',
10 11 12
Number of copies 9 so P(lor6turnsup)=1-827__
- 12
27

0.35 0.30 0.15


0.20 Let £X be the amount won in a game.
Probability
If
If aneithe
1 or a 6 turns up 'you Will
· £1 · Then X= 1 and P(X = 1) = 12
27
(a) Calculate an estimate of the mean number of copies that he sells in a week.
(b) The newsagent buys the magazines at 85p each and sells them at £1.45 each. Any copies r a 1. nor a 6 turns up ' you pay £5 · Then X = -5 and P(X ~
' = -5) = 27

not sold are destroyed. The probab!lity distribution of X is '

(i) Find the profit on these magazines in a week when he sells 11 copies. X -5 1
(ii) Construct a probability distribution table for the newsagent's weekly profit from the 19
sale of these magazines. Hence, or otherwise, calculate an estimate of his mean 27

weekly profit. (NEAB)


So E(X) = L xP(X =x)
a!lx
8 19
Solution 4.7 =(-5)x-+1x-
27 27
(a) Let X be the number of copies he sells in a week. 7
E(X) =I, xP(X = x) 9
= 9 X 0.20 + 10 X 0.35 + 11 X 0.30 + 12 X 0.15
The negative amount indicates that you would expect to make a loss of C
loss=£~
= 10.4
An estimate of the mean number sold in a week is 10.4. After nine games, expected x 9 = £7. 9 per game.
(b) (i) When he sells 11 copies, profit= 11 x £1.45-12 x £0.85 = £5.75
(ii) When he sells 9 copies, profit= 9 x £1.45 - 12 x £0.85 = £2.85 In making the game worthwhile , y ou would obviOusly
loss. . want to ensure that you didn't make a
When he sells 10 copies, profit= 10 x £1.45 -12 x £0.85 = £4.30
When he sells 12 copies, profit= 12 x £1.45 - 12 x £0.85 = £7.20 Change is for payment to £y wh en you get 1 or6 on any of the dice. The probability
the rule
dtstnbution
Let £Y be the weekly profit. The probability distribution of Y is
4.30 5.75 7.20
y 2.85 X -5 y
0.35 0.30 0.15
0.20 8 19
T7 T7
E(Y) = 2.85 X 0.2 + 4.30 X 0.35 + 5.75 X 0.30 + 7.20 X 0.15
E(X) = (-5) X_!_+ y X _1:!
=4.88 27 27
An estimate of his mean weekly profit is £4.88. = -40+19y
27
You want E(X) > 0
-40+ 19y
Example 4.8 l.e. 27 >0
In a game, three dice are thrown. If you get a one or a six on any of the dice, you win £1,
otherwise you have to pay £5. How much would you expect to win or lose when you play -40 + 19y > 0
19y > 40
nine games? y > 2.105 ...
Y au are now given the opportunity to change the rule for the amount you win when a one or

1nake the game worthwhile, perhaps :!ggest ~~:;;. wm tf you get a one or a six is £2.11. To
The minim urn amount you should su est th · ·
a s1x appears.
To make the game worthwhile to yourself, what is the minimum amount that you would

suggest?
244 .0.. CONC!Sl-: COUf\S'r~ iN ,,c:,,-!_.E\·Tl_ ST,D. TiS'I !CS

THE EXPECTATION OF ANY FUNCTION OF X, f(g(X))


Exercise 4b Expectation
11. In a game, a player rolls two balls down an The definition of expectation
1 can be extended to an YfunctiOn
. o f X,
1. The probability distribution for the random inclined plane so that each ball finally settles in
variable X is shown in the table: one of five slots and scores the number of points such as lOX, x2, _x' X_ 4, etc.
4 allotted to that slot as shown in the diagram
0 1 2 3
X below:
1 In general, if ,__ 1C t'wn o f t 11C d'lscrctc random variable
1s an)'. f•Jr then
4
X -x}
Find E(X).
2. The random variable X has p.d.f. P(X = x) for
x = 5, 6, 7, 8, 9 as defined in the table:
It is possible for both balls to settle in one slot For example,
and it may be assumed that each slot is equally
6 7 8 9 likely to accept either ball. E(lOX) ~ ~ lOxP(X ~ x)
X 5
The player's score is the sum of the points scored
2_
11
E(X 2 ) ~ ~ x 2 P(X ~ x)
by each ball.

Find!'·
Draw up a table showing all the possible scores
and the probability of each. E(~) ~~~P(X~x)
If the player pays lOp for each game and receives
3. The probability distribution of a random back a number of pence equal to his score, E(X-4) ~~(x-4)P(X~x)
variable X is as shown in the table: calculate the player's expected gain or loss per 50
4 5 games. (C Additional)
1 2 3
X
0.1 12. In a game a player tosses three fair coins. He wins Example 4.9
0.1 0.3 y 0.2
£10 if three heads occur, £x if two heads occur,
£3 if one head occurs and £2 if no heads occur. The random variable X has p.d.f. P(X ~ x ) for x-_ 1, 2, 3 as shown.
Find (a) the value of y, (b) E(X).
Express in terms of x his expected gain from
4. Find the expected number of heads when two each game.
1 2 3
fair coins are tossed. Given that he pays £4.50 to play each game,
0.1 0.6 0.3
calculate
5. A bag contains five black counters and six red
(a) the value of x for which the game is fair,
counters. Two counters are drawn, one at a time, Calculate
(b) his expected gain or loss over 100 games if
and not replaced. Let X be 'the number of red x ~ 4.90. (C Additional)
counters drawn'. Find E(X). (a) E(X), (b) E(3), (c) E(5X), (d) E(5X + 3).
13. In an examination a candidate is given the four
6. An unbiased tetrahedral die has faces marked 1,
answers to four questions but is not told which
2, 3, 4. If the die lands on the face marked 1, the
answer applies to which question. He is asked to Solution 4.9
player has to pay lOp. write down each of the four answers next to its
If it lands lands on a face marked with a 2 or a 4, appropriate question. (a) E(X) ~~ xP(X ~ x)
the players wins 5p and if it lands on a 3, the ~ 1 X 0.1 + 2 X 0.6 + 3 X 0.3
(a) Calculate in how many different ways he
player wins 3p. Find the expected gain in one
could write down the four answers. ~2.2
throw. (b) Explain why it is impossible for him to have
7. A discrete random variable X can take values 10 just three answers in the correct places and (b) E(3) ~ ~ 3P(X ~ x)
and 20 only. If E(X) == 16, write out the show that there are six ways of getting just ~ 3 X 0.1 + 3 X 0.6 + 3 X 0.3
probability distribution of X. two answers in the correct places.
~3
(c) If a candidate guesses at random where the
8. The discrete random variable X can take values four answers are to go and X is the number Notice that the expected value of a constant is equal to the constant
0, 1, 2 and 3 only. Given P(X < 2) = 0.9,
P(X ~ 1) ~ 0.5 and E(X) ~ 1.4, find
of correct guesses he makes, draw up the
probability distribution for X in tabular form. (c) E(5X) ~ ~ 5xP(X ~ x) .
(a) P(X ~ 1), (b) P(X ~ 0). (d) Calculate E(X). (L Additional) ~ 5 X 0.1 + 10 X 0.6 + 15 X 0.3
~11
9. 0 1 2 3 14. The discrete random variable X has p.d.f.
X P(X ~ x) ~ kx for x ~ 1, 2, 3, 4, 5 where k is Notice that 5E(X) ~ 5 x 2.2 ~ 11
c c' constant. Find E(X). so E(5X) ~ SE(X).
The above table shows the probability 15. A woman has three keys on a ring, just one of
which opens the front door. As she approaches
distribution for a random variable X.
the front door she selects one key after another
Calculate (a) c, (b) E(X). (L Additional) at random without replacement. Draw a tree
10. A bag contains three red balls and one blue ball. diagram to illustrate the various selections
before she finds the correct key. Use this diagram
A second bag contains one red ball and one blue
ball. A ball is picked out of each bag and is then to calculate the expected number of keys that
placed in the other bag. What is the expected she will use before opening the door.
(L Additional)
number of red balls in the first bag?
i~-'f~03/~F:!i L. !T\' DiS-i-F\! DljTiO\~j DiSCH !I i/'VIAfiiES 247

E(R) = l:rP(R = r)
(d) E(5X+3)=1:(5x+3)P(X=x) =lx~+3x~+5x~+7x~+9x~+11xil
= 711 36 36
= 8 X 0.1 + 13 X 0.6 + 18 X 0.3 18
= 14 E(A) = 6 X 7fi = 47~
Notice that E(5X) + E(3) = 11 + 3 = 14
so E(5X + 3) = E(5X) + E(3) The expected value of the area is 47j cm2.

D
i.e. E(5X + 3) = 5E(X) + 3 1

(c) P=4x24R-1=% 24R-

1n generai, for constants a and b,


:. E(P) = E(~) R

=96E(~)
Example 4.10
E(~) =1:~ P(R = r)
A six-sided die has faces marked with the numbers 1, 3, 5, 7, 9 and 11. It is biased so that the
probability of obtaining the number R in a single roll of tbe die is proportional toR. = 1
11131517 1 9 1 11
-x- + - x -
X 36 + J X 36 + S X 36 + 7 X 36 + 9 36 11 36
(a) Show that the probability distribution of R is given by
6
r=1,3,5,7,9,11.
E(P) = 96 X~= 16
6

(b) The die is to be rolled and a rectangle drawn with sides of lengths 6 em and Rem. The expected value of the perimeter is 16 cm.
Calculate the expected value of the area of the rectangle. 1
(c) The die is to be rolled agaiu and a square drawn with sides of length 24R- em.
Calculate the expected value of the perimeter of the square. (NEAB) Example 4.11
X is the number of heads obtained when two coins are tossed. Find
Solution 4.10 (a) the expected number of heads
7 9 11
1 3 5 (b) E(X 2 ), '
(a) r
7k 9k 11k (c) E(X 2 - X).
k 3k 5k

2: P(R = r) = 1 .. k + 3k + 5k + 7k + 9k + llk = 1 Solution 4.11


36k = 1
1 P(X = 0) = P(T' T) = 12 X1-
2- 1
4
k=- P(X = 1) = P(T, H)+ P(H, T) =!X l: +!X!= 12
36
P(X=2)=P(H' H)=lx1-1 2 2- 4
The distribution is The probability distribution for X is
7 9 11
3 5
---~----:---~~
r 1
11
7
/G !1-:::-(X--=x-)
1 3

"'
5

"' "' "'


r
"'
.. P(R= r) =)6 for r= 1, 3, 5, 7, 9,11

R
(b) A=6R
:. E(A) = E(6R)
= 6E(R) C~}
248 A CONCiSE. COURSE. IN A~ LEV[\_ STATiSTiCS

Theoretical approach
(c) E(X' _X) = "E(x 2 - x)P(X = x) For a discrete random variable X, with E(X) = /', the variance is defined as follows:
=0X~+Ox!+2X~
_1 1S

X) = !,
2
. h -E,(X') E(X) = 1 L 1 =! and E(X -
Nonce t at - 2
so E(X 2 X) E(X') E(X) Alternatively, Var(X) = E(X- 1<) 2
= E(X 2 - 21'X + 1' 2 )
= E(X 2 ) - 21'E(X) + E(l' 2 )
ln generaL for t\"vo functions of X, g(x) and h(x)
= E(X 2 ) - 21' 2 + 11 2
E(g(X! + hiX)) cc Elg(X!l + Elh(XI) = E(X 2) -~t 2
For example

E(X' + ~) = E(X + E(~), 2


) This format is usually easier to work with.
NOTE: I'= E(X) so 1' 2 = (E(X)) 2
Example 4.12 .. . 'b · This is very cumbersome to write, so it is often written E 2 (X). This is similar to the notation
. d . bl X has the following probab!hty dlstrl ut!On.
The discrete ran om vana e 4 used in trigonometry where (cos A) 2 is written cos 2 A.
0 1 2 3
X You could write \/m_·i /.. (
0.20 0.20
0.20 0.20
P(X =X) 0.20 Var(X) is sometimes written as a 2 (a is pronounced 'sigma').
(a) Write down the name of the distribution of X.
(b) Find P(O <X< 2).
(L)
(c) Find E(X). Example 4.13
(d) Find E(X 2 + 3X).
The random variable X has probability distribution as shown in the table:
Solution 4.12 1 2 3 4 5
(a) It is a discrete uniform distribution.
(b) P(O .;;X <2) =P(X =0) + P(X = 1) = 0.2 + 0.2 =0.4 0.1 0.3 0.2 0.3 0.1

(c) By symmetry, E(X) = 2 Find


(d) E(X' + 3X) = E(X') + 3E(X) (a) I'= E(X),
E(X 2 ) ="Ex' P(X = x) 2 2
2 (b) E(X 2 ),
= 0.20 x (0 2 + 1 2 + 2 + 3 + 4 ) (c) Var(X),
=6 (d) a, the standard deviation of X.
so E(X' + 3X) = 6 + 3E(X) = 6 + 6 = 12
Solution 4.13
VARIANCE Of X, VAR()O OR VOO (a) By symmetry, I'= E(X) = 3
Remember that variance= (standard deviation)'. (b) E(X 2 ) ="Ex 2 P(X=x)
= 1 X 0.1 + 4 X 0.3 + 9 X 0.2 + 16 X 0.3 + 25 X 0.1
= 10.4
Experimental approach ,. .
For a frequency distribution with mean X, the variances IS glVen by (c) Var(X) = E(X 2 ) -1' 2
=10.4-9
' 'Lf(x- x)' =1.4
s u (d) a=fiA
This can also be written = 1.18(2 d.p.)
"Lfx' _,
51=---X.
u
251

Oil a variance
Example 4.14 1. The discrete random variable X has p.d.f. 7. The ra~dom variable X has p.d.f. P(X, x) as
Two boxes each contain three cards. The first box contains cards labelled 1, 3 and 5; the P(X = x) for x = 1, 2, 3. shown m the table:
second box contains cards labelled 2, 6 and 8. In a game, a player draws one card at random
from each box and his score, X, is the sum of the numbers on the two cards. 1 2 3 -2 -1 0 1 c

(a) Obtain the six possible values of X and find the corresponding probabilities. 0.2 0.3 0.5 0.1 0.1 0.3 0.4 0.1
(b) Calculate E(X), E(X 2 ) and the variance of X. (C Additional) Find the value of c (a) if E(X) = 0. 3 (b) if
Find (a) E(X), (b) E(X 2 ) (c) Var(X).
E(X 2) = 1.8. '
2. The discrete random variable X has the
probability distribution specified in the folio i 8. The ~iscre~e random variable X has probability
Solution 4.14 table. w ng function given by
Probability distribution
(a) Possibility space
-1 0 1 2
1!1' X"" 1, 2, 3, 4, 5,
X x==6
Second box
5 7 9 11 13
p(x)= (~ othet:wise,
3
2 6 8 X

1 2 2
9
1
9
I. P(X-x) 0.25 0.10 0.45 0.20
where c is a constant.
9 9 Determine the value of c and hence the mode and
1 (a) Find P(-1.;;X <1).
(b) Find E(2X + 3). mean of X. (L)
/~' i j_
First box 3
!'t I, 9. A; game consists of tossing four unbiased coins
5 ' 3. The discrete random variable X hasp d f
st.~ultaneously. The total score is calculated by
P(X = 0) = 0.05, P(X = 1) = 0.45 ...
P(X = 2) = 0.5. Find gtvmg thre~ points for each head and one point
(b) E(X) =~xP(X=x) (a) I'= E(X), (b) E(X 2 ), (c) E(5X 2 + 2X- 3). for each tad. The random variable X represents
=3X~+5X~+7X~+9X~+11x~+13X~ the total score.
4. The discrete random variable X has p.d.f. (a) Show that P(X = 8) = i\.
= 8} P(X = x) = k for x = 1, 2, 3, 4, 5, 6. Find (b) Copy and comp~ete the table, given below,
(a) E(X), (b) E(X 2 ), for the symmetncal probability distribution
E(X 2 ) = ~ x 2 P(X = x) (c) E(3X + 4), (d) Var(X). of X.
= 9 X~+ 25 X~+ 49 X~+ 81 X~+ 121 X~+ 169 X~
5. The random variable X takes values 2 4 6 8 4 6 8 10 12
= 78l and its probability distribution is repr~se'nt~d '.
the vertical line graph. m

3
2
Var(X) = E(X 2 ) - E (X)
= 78t- 8t' 0.5 (c) Calculate the variance of X. (NEAB)

= 8¥ 0.4
0.3 10. F~nd_Yar.(X) for each of the following probability
0.2 d1stnbutwns:
0.1 (a I
The following results relating to variance are useful. 0 +----1---l-----l----4~ X -3 -2 0 2 3
0 2 4 6 8'
P(X=x) o.:i 0.3 0.2 0.1 0.1
Find Var(X).
(b) 7
X 1 3 5 9
6· A roulette wheel is divided into six sectors of
unequal area, marked with the numbers 1 2 3 [
~
1 1 1
P(X-x)
4, 5, and 6. The wheel is spun and X is th~ ' '
6 6 4 6

l:and~m variable 't~: number on which the wheel (c)


For example stops · The probabthty distribution of X is as X 0 2 5 6
2 follows:
Var(2X) = 2 Var(X) P(X=x) 0.11 0.35 0.46 0.08
=4 Var(X) X 1 2 3 4 5 6
11. X is the random variable 'the number on a
Var(2X + 3) = 2 2 Var(X) P(X-x) 1 3 I 1 3
h biased die', and the p.d.f. of X is as shown,
T6 16 4 4 16
=4 Var(X)
Calculate (a) E(X), (b) E(X'), (c) E(3X- 5), X 1 2 3 4 5 6
Var(5 -X)=(-l}"Var(X) (d) E(6X'), (e) Var(X). 1 1 1 1 1
P(X=x) 6 6 5 y 5 6
= Var(X)
Find (a) the value of y, (b) E(X), (c) E(X 2 ),
(d) Var(X), (e) Var(4X).
·.//~i i/<.ElU S 253

THE CUMULATIVE DISTRIBUTION FUNCTION, F(x)


20. (a) In a game a player pays £5 to toss three fair
12. A team of three is to be chosen from four boys coins. Depending on the number of tails he
and five girls. If X is the random variable 'the obtains he receives a sum of money as In a frequency distribution, the cumulative fre uencie . .
number of girls in the team', find (a) E(X),
frequencies up to a particular value. q s are obtamed by summmg all the
shown in the table below.
(b) E(X 2 ), (c) Var(X). 0
u~ ~o
3 2 1 In the same way, in a probability distribution the probabilities .
Number of tails summed to give a cumulative probabil't Th'e cumulattve
. probability afunction
certam value
13. Two discs are drawn without replacement from a £3 £1 1 y. . 'tare F( )
box containing three red and four white discs. If Sum received £10 £6
Consider the following probability distribution lS wn ten x .
X is the random variable 'the number of white Calculate the player's expected gain or loss

:(X~ 0.~5 0~1


discs drawn', construct a probability distribution 2 3
table. Find (a) E(X), (b) E(X 2
), (c) Var(X), over 12 games.
{b) A variable X has a probability distribution
(d) Var(3X- 4). shown in the table below. I x) 04 03 0.:5

14. If X is the random variable 'the sum of the scores 10 20 50 100 F(1) ~ P(X.;; 1) = 0.05
on two tetrahedral dice', where the score is the Value of X
number on which the die lands, find (a) E(X), p q F(2) = P(X,;; 2) = P(X = 1) + P(X ~ 2) ~ 0 05 + 0 4- 0 45
Probability 0.5 0.3
(b) Var(X), (c) Var(2X), (d) Var(2X + 3). F(3) =P(X.;;3)=0.75 . . - .
Given that X can only take the values 1, 2, 5 F(4) ~ P(X.;; 4) ~ 0.9
or 10, and that E(X) == 25, calculate
15. The discrete random variable X has probability F(5) ~P(XO) = 1
distribution as shown in the table. (i) the value of p and of q.
Find Var(2X + 3). (ii) the variance of X. Notice that F(5) give the total probability.
20 30 In a fairground game, a player rolls discs on The cumulative distribution function is
X 10 to a board containing squares, each of
0.6 0.3 which bears one of the numbers, 10, 20, 50 1 2 3 4 5
0.1
or 100. If a disc falls entirely within a
square, the player receives the same number F(x) 0.05 0.45 0.75 0.9 1
16. Two discs are drawn, without replacement, from of pence as the number in the square; if it
l.tlillitl. for tht.: d.i::crdc random -variahlv
a box containing three red discs and four white does not, the player does not receive
discs. The discs are drawn at random. If X is the anything. The probability that a player will cumuL1.·tl. "J.·.·
' ·1 -·
C·t,· !>, 1 fl_llJ[j_{HJ ,. ·
Tl.U!CtlO!l IS ·
random variable 'the number of red discs receive money from any given roll is i:-· If a
drawn', find (a) the expected number of red player does receive money, the probabilities
discs, (b) the standard deviation of X. of receiving lOp, 20p, SOp or £1 are the
same as those connected with the values of Sometimes F(x) can be given by a formula as in the following example.
17. Ten identically shaped discs are in a bag; two of X above. How many discs should a player
them are black, the rest white. Discs are drawn be allowed to roll for SOp, if the game is to
at random from the bag in turn and not replaced.
Example 4.15
be fair? (C Additional)
Let X be the number of discs drawn up to and
including the first black one. 21. {a) A man takes part in a game in which he
The discrete random variable X has cumulative distribution function F(x) = ~ for
X~ 1, 2, ... , 6. Write out the probability distribution and
6
List the values of X and the associated throws two fair dice and scores the sum of . suggest wh a t X represents.
two numbers shown. The rewards for the
theoretical probabilities.
Calculate the mean value of X and its standard scores are given in the following table.
deviation. What is the most likely value of X? other
Solution 4.15
12 10 7 5
If, instead, each disc is replaced before the next is Score
The cumulative distribution function is
drawn, construct a similar list of values and 16 6 3 5 0
Reward(£)
point out the chief differences between the two 1 2 3 4 5 6
Calculate the expected reward for a throw
lists.
of the two dice. 1 6 1 4 5
1
6 6 6 6 6
18. The discrete random variable X has p.d.f. {b) A bag contains five identical discs, two of
which are marked with the letter A and You can find the probability distribution from the table
J'(X ~x) ~ k[x[ three with the letter B. The discs are - 1 .
where x takes the values -3, -2, -1, 0, 1, 2, 3. randomly drawn, one at a time without PCX -1) ~ 6
Find (a) the value of the constant k, replacement, until both discs marked A are P(X ~ 2) ~ F(2)- F(1) = ~ _ j; = t
(b) E(X),
(c) E(X 2 )
obtained. Show that the probability that
three draws are required is fa.
P(X ~ 3) = FC 3l- F(2) ~?,-% = i and so on
(d) the standard deviation of X. Given that X denotes the number of draws The probability distribution is
required to obtain both discs marked A,
19. The random variable X takes integer values only copy and complete the following table. 1 2 3 4 5 6
and has p.d.f. 5 1
~
1 1 l
2 3 4
~x~~~b x~1,~3,~5 Value of X 6 6 6 6

J'(X~x)~k(10-x) x~6,7,8,9
Probability of X This is the uniform distribution, P(X ~ x) ~ l6 x ~ 1 2 6
X could b th h ' ' ' · •• , •
Find
(a) the value of the constant k, (b) E(X), Evaluate (i) E(X), (ii) E(X
2
)
..,,_,,_,,'"~---~o~- e e score w en a die is thrown.
(c) Var(X), (d) E(2X- 3), (e) Var(2X- 3). (iii) the variance of X. (C Additionafl
nction
Example 4.16 . h
~ ' J&re•~ mred<>m ""~ bl< X ''' ru:<>l "'" "'";""Coo fu•re:<><> fi" I " "; lowre
1. The probability distribution for the random 6. For a discrete random variable X the cumulative
F variable Y is shown in the table: distribution function is given by F{x) = kx,
x = 1, 2, 3. Find (a) the value of the constant k,
y 0.1 0.2 0.3 0.4 0.5 (b) P(X < 3),
\:(x) 0.2 0.32 0.67 0.9 1 0.05 0.25 0.3 0.15 0.25
(c) the probability distribution of X,
(d) the standard deviation of X.

Construct the cumulative distribution table. 7. The discrete random variable X has distribution
Find (a) P(X ~ 3), (b) P(X > 2). function F{x) where
2. For a discrete random variable R the cumulative
F(x)~1-(1-*x)" lorx~1,2,3,4
distribution function F'(r) is as shown in the
Solution 4.16 table: (a) Show that F(3) ~;~and F(2) ~ i.
(a) From the table, (b) Obtain the probability distribution of X.

~(r)
1 2 3 4 (c) Find E(X) and Var(X).
F(3) ~ P(X < 3) ~ P(X ~ 1) + P(X ~ 2) + P(X ~ 3) ~ 0.67 (d) Find P(X > E(X)).
F(2) ~ P(X < 2) ~ P(X ~ 1) + P(X ~ 2) ~ 0.32 I 0.13 0.54 0.75 1
8. The cumulative probabilities for X are given in
P(X ~ 3) ~ F(3)- F(2) Find (a) P(R ~ 2), (b) P(R > l), (c) P(R;;. 3), the following table, where X takes the values
~ 0.67- 0.32 ~ 0.35 (d) P(R < 2), (e) E(R). 0, 1' 2, ... 12.

3. Construct the cumulative distribution tables for X F(x)


(b) P(X > 2) ~ 1- P(X < 2) the following discrete random variables:
~ 1-F(2) 0 0.0115 Use the table to
(a) the number of sixes obtained when two 0.0692 find
~ 1-0.32 1
ordinary dice are thrown,
-0.68 (b) the smaller number when two ordinary dice
2 0.2061 (a) P(X.;; 8),
are thrown, 3 0.4114 (b) P(X~S),
(c) the number of heads when three fair coins 4 0.6296 (c) P(X;>4),
are tossed. 5 0.8042
Example 4.17 . (d) P(3 <X.;; 7),
6 0.9133
The cumulative probabilities for a random variable X are given in the followmg table, where 4. For the discrete random variable X the (e) /'(1 <;X <9).
7 0.9679
cumulative distribution function F(x) is as
X takes the values 0, 1, 2, ... , 10. shown:
8 0.9900
9 0.9974
X F(x) X 3 4 5 6 7 10 0.9994
Use the table to find 11 0.9999
0 0.0388 F(x) 0.01 0.23 0.64 0.86 1 12 1
l 0.1756 (a) P(X < 5),
0.4049 (b) P(X > 3), Construct the probability distribution of X, and
2 find Var(X).
3 0.6477 (c) P(3 <X <7),
4 0.8298 (d) P(X ~ 7), 5. For a discrete random variable X the cumulative
0.9327 distribution function is given by
5 (e) P(X :> 8).
6 0.9781 x'
F(x)~
9
forx~1,2,3.
7 0.9941
(a) Find F(2).
8 0.9987 (b) Find P(X ~ 2).
9 0.9998 (c) Write out the probability distribution of X.
10 1 (d) Find E(2X- 3).

Solution 4.17
(a) P(X < 5) ~ F(5) ~ 0.9327
(b) P(X > 3) ~ 1- P(X < 3) ~ 1-0.6477 ~ 0.3523
(c) P(3,;; X,;; 7) ~ F(7)- F(2) ~ 0.9941- 0.4049 ~ 0.5892
(d) P(X ~ 7) ~ P(X,;; 7)- P(X < 6) ~ 0.9941-0.9781 ~ 0.016
(e) P(X;, 8) d _ P(X <7) d - F(7) ~ 1- 0.9941 ~ 0.0059
T f-T'i :--' i 1/

I
The probability distribution for X+ Y is
TWO INDEPENDENT RANDOM VARIABLES 1 2 3 4 5

Y arc any nvo 0.03 0.17 0.27 0.33 0.2


H
+ Yl ~ i E(Yi E(X + Y) ~ 1 X 0.03 + 2 X 0.17 + 3 X 0.27 + 4 X 0.33 + 5 X 0.2
~ 3.5
Jf X and Y are uute["'"wnu
~ E(X) + E( Y) from (j) So E(X + Y) ~ E(X) + E(Y)
To find the variance, consider first E((X + Y) 2 )
To illustrate this, consider two independent random variables X and Y,
3 E((X + Y} 2 } ~ 1 X 0.03 + 4 X 0.17 + 9 X 0.27 + 16 X 0.33 + 25 X 0.2
x Jo 1 2 y )1 2
~ 13.42
P(X ~ x) I 0.1 0.5 0.4 ~
P(Y y) I 0.3 0.2
0.5
Var(X + Y) ~ 13.42- 3.5 2
~ 1.17
E(X) ~1:xP(X~x) E(Y) ~1:yP(Y~y)
~ Var(X) + Var(Y) from@ So Var(X + Y) ~ Var(X) + Var(Y)
~ 0 X 0.1 + 1 X 0.5 + 2 X 0.4 ~ 1 X 0.3 + 2 X 0.2 + 3 X 0.5
~ 1.3 ~ 2.2 If you perform similar calculations to find the expectation and variance of X- Y, you will find
E(X 2 ) ~1:x 2 P(X~x) ~1:y
2
E(Y
2
) P(Y~y)2 2 that
~o2 X 0.1 + 12 X 0.5 +2 2 X 0.4 ~1 2
X 0.3 +2 X 0.2+ 3 X 0.5
E(X- Y) ~ E(X)- E(Y) but Var(X- Y) ~ Var(X) + Var(Y)
~ 2.1 ~ 5.6 2 2
Var(X} ~ E(X 2) - E 2 (X} Var(Y} ~ E(Y ) - E2 (Y)
~2.1-1.3 2 ~5.6-2.2
~0.41 ~0.76
The following general results are useful:

Notice that
E(X) + E(Y) ~ 1.3 + 2.2 ~ 3.5 ... <D
Yi
Var(X) + Var(Y) ~ 0.41 + 0.76 ~ 1.17 ... <P J:·;(riX Y)
Now consider the distribution X+ Y where X+ Y can take the values 1, 2, 3, 4, 5.
For example,
P(X + Y ~ 4) ~ P(X ~ 1 andY~ 3) + P(X ~ 2 andY~ 2)
~o.5 x o.5 +0.4 x 0.2
T
~ 0.33 Notice the+ sign here.
A tree diagram shows all the outcomes:
X+Y Probability
X
y Example 4.18
1 0.1 X 0.3 ~ 0.03
X and Yare independent random variables such that
2 0.1 X 0.2~0.02
E(X) ~ 10, Var(X) ~ 2, E(Y) ~ 8, Var(Y) ~ 3.
3 0.1 X 0.5 ~ 0.05
Find (a) E(5X +4¥), (b) Var(5X + 4Y), (c) Var(~X- Y), (d) Var(~X + Y).
2 0.5 X 0.3 ~ 0.15

3 0.5 X 0.2 ~ 0.1 Solution 4.18

4 0.5 X 0.5 ~ 0.25 (a) E(5X +4Y) ~


5E(X) +4E(Y) ~ 5 x 10 +4 x 8 ~ 82

3 0.4 X 0.3 ~ 0.12 (b) Var(5X + 4Y) ~ 5 2Var(X) + 4 2 Var(Y) ~ 25 x 2 + 16 x 3 ~ 98

0.4 X 0.2 ~ 0.08 (c) Var(~X- Y) ~ (}) 2 Var(X) + Var(Y) ~ ~ x 2 + 3 ~ 3.5


4
(d) Var(~X + Y) ~ (!) 2 Var(X) + Var(Y) ~ t x 2 + 3 ~ 3.5
5 0.4 X 0.5 ~ 0.2
COMPARING THE DISTRIBUTIONS OF X1 + X2 and 2X
DISTRIBUTION OF X1 + Xz + · · · + Xn Confusion sometimes arises between the random variable X 1 + X 2, where X 1 , X 2 are two
2
Consider the random variable X where E(X) ~I" and Var(X) ~a . independent observations of X, and the random variable 2X.
Take two observations X 1 and X2 from X. You will see from the following example that the distributions of the two random variables,
2
E(X1)~p, E(X,)~p, Var(X,)~a', Var(X,)~a. X 1 + X 2 and 2X are very different.

E(X1 +X,)~ E(X1) + E(X,) ~I"+ I"~ 2p ~ 2E(X)


Example 4.20
If the observations are independent When a tetrahedral die is thrown, the number on the face on which it lands, X, has
Var(X, + X2) ~ Var(X1) + Var(X2) ~ a2 +a'~ 2a' ~ 2 Var(X) probability distribution as shown, with E(X) ~ 2.5 and Var(X) ~ 1.25.
This result can be extended to n observations. 1 2 3

0.25 0.25 0.25

H the observations an: imlq:,cndclll (a) Find the probability distribution of S, the sum of the two numbers obtained when the die
-\- c;;_2
is thrown twice, where S ~X 1 + X 2 and illustrate it by drawing a vertical line graph.
--~ n Find E(S) and Var(S).
(b) Find the probability distribution of D, where Dis double the number on which the die
lands when it is thrown once. Illustrate by drawing a vertical line graph.
Example 4.19 . f h ber of heads obtained when six coins are tossed. Find E(D) and Var(D).
Find the expectation and vanance o t e num .
Solution 4.20
Solution 4.19 1 o 1 (a) Consider the sum when the die is thrown twice and illustrate the outcomes on a possibility
. . tossed where X can take the va ues ' .
Let X be the number of headds when a co~n~s The p;obability distribution is space diagram.
First find the expectatton an vra_r_.w_n_c_e_o_ _· --;:----11 Scan take the values 2, 3, 4, 5, 6, 7, 8 and the outcomes (all equally likely) are shown in
0
\:(X~
the diagram:

x) os :5 \
E(X) ~ 0.5 (by symmetry)
E(X2) ~ 1 X 0.5 ~ 0.5 2
Var X) ~ E(X')- E2(X) ~ 0.5- 0.5 ~ 0.25 .
so ( X X + ... + X where y is the number of heads when scx heads are
Now consider Y = 1 + 2 6
tossed.
Var(Y) ~ 6 Var(X)
E(Y) ~ 6E(X) 2 3 4
~ 6(0.25) First throw, X1
~ 6(0.5)
~ 1.5
~3
The probability distribution of Sis:
The expected number of heads is 3 and the variance is 1.5.

~'lulL_
s 2 3 4 5 6 7 8
1 1
~
2 3 3 2
P(S~s) T6 T6 T6 T6 T6 T6

E(S) ~ 5. (by symmetry) 2345678 s


E(S 2
) ~ 1: s 2P(S ~ s)
~ ,',(4 + 18 + 48 + 100 + 108 + 98 + 64)
~27.5
r r--!H!J r 'r r-i<, 261

Combi of ra va
2
Var(S) = E(S 2 ) - E (S) 1. Ihdep_;ndent random variables X and yare such (e) Chonstruct the probability distribution for
=27.5-25 ~i~~E(X) = 4, E(Y) = 5, Var(X) = 1, Var(Y) = 2. t e random variable X - y
(f) Verify that E(X- Y) = E()C) _ E(Y).
=2.5 (a) E(4X + 2Y), (g) Venfy that Var(X- Y) = Var(X) + Var(Y).
(b) E(5X- Y),
As expected, (c) Var(3X + 2¥) 5. Rods of le~gth 2 m or 3 m are selected at
(d) Var(5Y- 3X): rando~ wtth probabilities 0.4 and 0 6
E(S) = E(X 1 + X 2 ) = 2E(X) = 5 respect1vely. ·
(e) Var(3X- 5¥).
Var(S) = Var(X 1 + X 2) = 2 Var(X) = 2.5
(a) Find the expectation and variance of the
2. Independent random variables X and y h
that E(X') = 14, E(Y') = 20, Var(X) = 1~re sue
(b) D is double the number on which the die lands, so D = 2X. length of a rod.
The probability distribution for D is (b) T:wo lengths are now selected at random
Var(Y) = 11. Find ' Fmd the expectation and variance of the.
8 (a) E(3X- 2Y), (b) Var(5X _ 2Y).
2 4 6 sum of the two lengths.
d (c) Three lengths are now selected at random.
0.25 3. Independent random variables X andy are such
0.25 0.25 S~ow that the probability distribution of y
0.25 that E(X) = 3, E(X') = 12, E(Y) = 4 E(Y') =
Fmdthevalueof ' 18. t e sum of the three lengths, is. '
2 4 6 8
E(D) = 5 (by symmetry) (a) E(3X- 2Y),
y 6 7 8 9
(b) E(2Y- 3X),
2 (c) E(6X+4Y),
E(D 2 ) = "£ d P(D =d) P(Y-y) 0.064 0.288 0.432 0.216
(d) Var(2X- Y),
= 0.25(4 + 16 + 36 + 64) (e) Var(2X + Y), and find E(Y) and Var(Y). Comment on
= 30 (f) Var(3Y + 2X). your results
2
Var(D) = E(D 2 ) - E (D) 4. Indepe~~ent ~an?om variables X and y have 6. Find t~e variance of the sum of the scores when
= 30-25 probabthty distnbutions as shown in the tables: an ordmary die is thrown ten times.
=5
7. X has a p.d.f. given by P(X = x) = kx
As expected, I. :(X-x) 003 1
0.2
2
0.4
3
0.1
X "' 1, 2, 3, 4. Find
(a) k,
'

E(D) = E(2X) = 2E(X) = 5 (b) E(X),


Var(D) = Var(2X) = 4 Var(X) = 5 0 1 (c) Var(X),
Although the means of the two distributions are the same, the variances are not. The variable 0.4 0.2
2 I
0.4 .
(d) P(X 1 + X 2 = 5)
(e) E(4X), '
'double the number' has the greater variance. (f) Var(X 1 +X,+ x 3 ).
(a) Find E(X), E(Y), Var(X), Var(Y).
(b) Construct the probability distribution for
the random variable X+ y
Summarising these results: (c) Verify that E(X + Y) = E()C) + E(Y).
(d) Venfy that Var(X + Y) = Var(X) + Var(Y).

:For two ol:•scrvc1tirms

For 11 otJSCfV<ltirms

It is important that you understand whether multiples or sums are being considered. Think
carefully about this point.
263

Miscellaneous worked examples


Example 4.21
rn
The discrete random variable X has probab,·l,·ty f unctiOn
.
For the discrete random variable X with probability density function P(X = x)
kx
:EP(X =X)= 1 2
X =2, 3
(x + 1)'
Cumulative distribution function F(x} = P(X < x)
2kx
!' = E(X} = :E xP(X = x) where ,u is the expectation of X P(X =x) = x=4, 5
(x 2 - 1)'
E(g(X}} = :E g(x)P(X = x)
0, otherwise
E(g(x) + h(X}) = E(g(X}) + E(h(X})
(a) Show that the value of k is 1&
E(X 2) = :E x 2 P(X = x) b
( )
. n·
Fmd the probability that X is less than 3 or greater than 4
a2 = Var(X} = E(X -1'} 2 where Var(X} is the variance of X (c) Fmd F(3.2). ·
2 2
= E(X 2 ) - ,u 2 (or Var(X} = E(X ) - E (X}) (d) Find (i) E(X), (ii) Var(X). (L)
2
= :E x 2 P(X = x)- ,u
~--
Solution 4.21
a= standard deviation of X= "1/Var(X)

For the random variable X and constants a and b,


(a) Whenx=2, P(X=2)=~=2k
2
2 +1 5
Var(a} = 0 3
E(a) =a When x = 3, P(X = 3) = k
Var(aX) = a 2 Var(X} 10
E(aX) = aE(X}
Var(aX +b)= a 2 Var(X) When x = 4, P(X = 4) = il_k:
E(aX +b)= aE(X} + b 15
For any two random variables X and Y and constants a and b, When x = 5, P(X = 5) = 10k
24
E(X + Y) = E(X} + E(Y) Now :EP(X = x) = 1
E(X- Y) = E(X}- E(Y) 2k 3k Sk 10k
.. - + - + - + - =1
E(aX +bY)= aE(X} + bE(Y) 5 10 15 24
E(aX- bY)= aE(X}- bE(Y} 33k
1
20 =
For independent random variables X and Y and constants a and b, k = 20
Var(X + Y) = Var(X} + Var(Y} 33
Var(X- Y) = Var(X} + Var(Y) Substituting this value for k, the probability distribution for X is

~ ~ ~
2
Var(aX +bY)= a 2 Var(X} + b Var(Y) 5
2
Var(aX- bY)= a 2 Var(X} + b Var(Y) I :(X-x) 25
99

For n independent observations of X, (b) P(X<3orX>4) P(X-2)+P(X=5)


8 25
E(X 1 + X 2 +···+X,)= nE(X} +-
33 99
Var(X 1 + X 2 +···+X,)= n Var(X} 49
'' For multiples of X, 99
(c) F(3.2) = P(X.; 3.2) = P(X = 2) + P(X = 3)
E(nX) = nE(X} 8 2
+-
Var(nX) = n2Var(X} 33 11
14
33
(c) When x ~ 12, the probability distribution becomes
(d) (i) E(X) ~I. xP(X ~ x)
8 ' 4 3x99+
2 5 1x99
,\ -10 12 24
=2x 33 +3xrr+
1
=3~ 3

(ii) E(X 2) ~I. x 2P(X ~ x) E(X) ~1x- 5 ~ 1 x 12-5 =3 from (a)


=4x 33 s +9xrr+
2 16 x99+
n 25 1,\
E(X 2 ) ~I. x 2P(X ~ x)
x99

~ 14n ~ 100 xi+ 144 xj+576 x;}


Var(X) ~ E(X 2)- E 2 (X) ~ 194
2
~ 14n- (3~) Var(X) ~ E(X 2 ) - E 2 (X)
~ 1.23 (3 s.f.) ~ 194-9
~ 185

The variance of Anne's profit in a single game is 185(£ 2 ).


Example 4.22
e in which a fair six-sided die is thrown once. If the score is 1, 2 or 3, Anne
Anne p Iays a garn f h · 6 A ins £2x Example 4.23
loses £10. If the score is 4 or 5, Anne wins £x. I t e score 1S ' nne w .
Any integer may be reduced to a single digit by the method illustrated below.
(a) Show that the expectation of Anne's profit is £(1x- ~) in a single game.
(b) Calculate the value of x for which, on average, A~ne s profit 1szero. (C)
51_, 5 + 1 _, 6
(c) Given that x ~ 12, calculate the variance of Annes profit m a smgle game. 58_, 5 + 8 _, 13 _, 1 + 3 _, 4
The random variable D denotes the digit that results from the reduction of an integer, selected
at randmn, from the twenty integers 50, 51, 52, ... , 69.
Solution 4.22
Let £X be Anne's profit. (a) Show that P(D ~ 5) ~ 0.15
(b) Determine the probability for each of the other possible values of D.
P(score 1, 2 or 3) ~ i~ !, therefore P(X ~ -10) ~ l: (c) Calculate the expected value of D.
P(score 4 or 5) ~~~!,therefore P(X ~ x) ~ 1 (d) Calculate, to two decimal places, the variance of D. (NEAB)

P(score 6) ~;\,therefore P(X ~ 2x) ~ t Solution 4.23


The probability distribution for X is To calculateD, consider the following
~----------------~a-, 50-> 5 + 0 ->G) 60 _, 6 + 0--> 6
X -10 X
51_, 5 + 1 _, 6 61 _, 6 + 1 _, 7
1 1
3 6 52_, 5 + 2 _, 7 62 _, 6 + 2 _, 8
53_, 5 + 3 _, 8 63 _, 6 + 3 _, 9
(a) E(X) ~I. xP(X ~ x) 54_, 5 + 4 _, 9 64 _, 10 _, 1 + 0 _, 1
~ -10 x (J:) + x x C!) + 2x x (j;) 55-->10-->1+0-->1 65-->11-->1+1-->2
1 2x 56_, 11 _, 1 + 1 _, 2 66 _, 12 _, 1 + 2 _, 3
~-5+-x+- 57_, 12 _, 1 + 2 _, 3 67 _, 13 _, 1 + 3 _, 4
3 6
58_, 13 _, 1 + 3 _, 4 68 _, 14 _, 1 + 4 ->G)
2 59-> 14--> 1 +4--->(3) 69--> 15 _, 1 + 5 _, 6
~-x-5
3
Three integers out of the twenty reduce to 5. These have been ringed in the list above.
So the expectation of Anne's profit in a single game is £(1x- 5).
(a) P(D ~ 5) ~fa~ 0.15
2
(b) IfE(X)~O then
3 x-5~0 (b)
d 1 2 3 4 5 6 7 8 9
x~7.5
2 2 2 2 3 3 2 2
20 20 20 20 20 20 20 20
266

7. The probability of there being X unusable (a) Show that P(B ~ 3) ~ fr.
matches in a full box of Surelite matches is given (b) Find the probability distribution of B.
(c) E(D} ~ :E1 dXP~+~ :)lc+ 3 X fa+ 4 X fa+ 5 Xfa+ 6 Xfa+ 7 X fa+ 8 X fa+ 9 X fo by P(X ~ 0) ~ 8k, P(X ~ 1) ~ Sk,
P(X ~2) ~P(X~ 3) ~ k, P(X~4) ~ 0.
(c)
(d)
Find E(B).
Show that P(R ~ 4) ~ ,",.
:::: 20 20
Determine the constant k and the expectation (c) Find P(R ~B). (L)
- _L(2 + 4 + 6 + 8 + 15 + 18 + 14 + 16 + 18) and variance of X.
- 20 2
Two full boxes of Surelite matches are chosen at
~s.os 2 2 16x2+25x-'o+36xfa+49xfa+64xfo+81xw random and the total number Y of unusable
11. The discrete random variable X has the
probability distribution given in the following
(d) E(D2) ~1xfa+4x 20 +9Xw+ 20 2
matches is determined. Calculate P(Y> 4), and table.
~ fo(2 + 8 + 18 + 32 + 75 + 108 + 98 + 128 + 162) state the values of the expectation and variance
~~ (C) 1 2 3 4
~31.55 2
Var(D} ~ E(D')- E2(D} ~ 31.55-5.05 8. Two unbiased four-sided dice, having the 0.4 0.3 0.1 0.2
numbers 1, 2, 3 and 4 on their faces, are thrown
~ 6.05[2 d.p.] together. The random variable D represents the Two independent observations of X arc made.
modulus of the difference between the numbers The value of the random variable Y is found by
on the two hidden faces. subtracting the smaller of the two values of X
(a) Show that P(D ~ 1) ~ ~- from the larger. If the two values of X are equal,
(b) Calculate the probability for each of the Y is zero. Show that P(Y == 1) = 0.34 and tabulate
other possible values of D. the complete probability distribution of Y.
(c) Calculate the expected value of D. (NEAB) Find
Miscellaneous exercise 4f . I ha ed six-faced die produces a
(a) E(Y),

1. Fertiliser is ~old in 25 ~ghsabksy I~e ~:~~~~l~t:o


4. A cunous s l 6-
h th probability distribution is
score, X, or w tc . e bl
9. A and B play a series of tennis matches. The (b) Var(Y),
(c) P(DE(Y)). (C)
probability that A wins any single match in the
that a sack lS underwetg t . y h 'bl b 1 w given in the followmg ta e.
the nearest 0.1 kg, is given m t eta e eo . series is 0.6. The winner of the series is the first
6 player to win either two matches in succession or
12. A box contains five discs, labelled 1, 2, 4, 5 and
1 2 3 4 5 6. In a game a player draws a disc at random,
more r a total of three matches. Show that the
k/6 replaces it and then draws again. The player's
than
P(X~x) k k/2 k/3 k/4 k/5 probability
score is the sum of the numbers on the two discs
0.3 0.4 0.5 0.5 (a) that the series lasts exactly two matches is
y 0.1 0.2 drawn.
0.025 0 Show that the constant k ·ts 2°
49·
Find the mean 0.52, Construct a table showing the 11 possible scores
0.5 0.3 0.1 0.075 (b) that the series lasts exactly three matches is and their probabilities. Find the expected score.
Probability and variance of X.
0.24. In a social club this game is played and the prize
Find the expected loss in weight per sack. The die is thrown twice. Show that th.e
Calculate the probability that the series lasts is £1 for each point scored. The players pay
probab~lity olf o1btaining equal scores ts (MEl} exactly four matches.
The rice quoted by the manufacturers ~o a approxtmate Y 4·
£7.50 each time they play. Find the expected
f p r for 1000 kg of fertiliser, packed lJ:;t 25 .kg Hence, or otherwise,show that the probability profit to the club after 250 games have been
arm\e . £240 Estimate by how much thts pnce . bl e R takes the series last five matches is 0.1152. played. (C)
sac cs, ts · h f T that would 5. A ran dom vana . the integer value r Calculate the expectation of n, the number of
exceeds the value oft e ertttser (C) with probability P(r) defmed by
actually be supplied to the farmer. matches in the series. 13. On a long train journey, a statistician is invited
P(r)=krz, r=1,2,3, The prize-money involved depends on n and is by a gambler to play a dice game. The game uses
A discrete random variable X can .t~ke only the P(r) ~ k(7- r)', r ~ 4, 5? 6, two ordinary dice which the statistician is to
2 shown in the table below.
· l 0 1 2 or 3 and its probabthty P(r) , O, otherwtse. f throw. If the total score is 12, the statistician is
va ues , , . . • _ O), k paid £6 by the ga~blcr. If the total score is 8, the
distribution rs glVen by P(Xk- P(X -'3) ~ 5k Fmd the value of k and the mean and vanance o n 2 3 4 or 5
P(Xd)~3k,P(X~2)~4, - ' the probabthty dtstnbutton. Exlubtt thts statistician· is paid £3 'by the gambler. However if
where k is a constant. Fmd dtstnbutwn by a smtable dtagrar_n. f h Prize~ money £1000 £1240 £1510 both or either dice show a 1, the statistician pays
Determme the mean and the vanance o t e (L} the gambler £2. Let £X be the amount paid to
(a) the value of k, . (NEAB)
(b) the mean and vanances of X. variable y where y = 4R- 2. Tickets are sold, each of which entitles the the statistician by the gambler after the dice are
thrown once.
purchaser to see the whole series of matches.
A random variable R takes the integer value r Determine, the probability that (a) X= 6,
6. A discrete random variable X has the Given that each ticket costs £5, calculate the
(b) X~ 3, (c) X~ -2.
3. with probability P(r} where distribution function number of tickets which must be sold to cover
P(r} = kr\ r= 1,2; 3,4, Find the expected value of X and show that, if
5 the expected value of the prize-money. (C)
P(r) , 0, otherwtse 1 2 4 the statistician played the game 100 times, his
X
expected loss would be £2. 78, to the nearest
Find f k, and display the distribution 1 1 i 1 10. A fair cubical die has two yellow faces and four
(a) t h evaueo
l F{x) TI 2 6 blue faces. The die is rolled repeatedly until a
penny.
on graph paper, Find the amount, £a, that the £6 would have to be
Write down the probability di~tribution of;·
yellow face appears uppermost or the die has
(b) the mean and the variance of the changed to in order to make the game unbiased.
distribution, f R 3 (L} ~~,l) Find the probability distribu~ton ff the;(
been rolled four times. The random variable B
represents the number of times a blue face
(c) the mean and the variance o 5 - . of two independent observ~uons {oh: appears uppermost and the random variable R
and find the mean and vanance o t
represents the number of times the die is rolled.
distribution of this sum.
(b)
She also observes that the average waiting Mixed test 4A
14. The discrete random variable X can take only the time, Y, before being served, is as follows.
values 0, 1, 2, 3, 4, 5. The probability 1. A discr~~e ra~dom variable X has the following (a) Show that P(X ~ 2) ~-'-
distribution of X is given by the following, where
Number of probabJilty dtstribution and can only take the {b) Fmd the probabthty d~~~nbutton for X.
a and b are constants. 1 2 3 4 values tabulated. (c) Evaluate E(X).
customers, x 0
P(X ~ 0) ~ P(X ~ 1) ~ P(X ~ 2) ~a (d) Show that Var(X) ~ \j!.
Average
X 1 3 6 n 12 ~n~ther two cells are submitted to radiation in a
P(X ~ 3) ~ P(X ~ 4) ~ P(X ~ 5) ~ b snmlar experiment and the random variable y
waiting time, Probability 0.1 0.3 k 0.25 0.15
P(X;;. 2) ~ 3P(X < 2) 6 9 12 represe~ts the t?tal number of cells in existence
0 2
v minutes aft~r thts expenment. The random variable z is
(a) Determine the values of a and b. (a) Find the value of k.
(AEB) defmed as Z =X- Y.
(b) Show that the expectation of X is¥ and Find her mean waiting time. Given that E(X) = 6.0, find (e) Find E(Z) and Var(Z). (I,)
determine the variance of X. (b) the value of n,
(c) Determine the probability that the sum of 17. During winter a family requests four bottles of {c) the variance of X. (C) 3. In a game two fair, cubical dice with faces
two independent observations from this milk every day, and these are left on the door-
numb~red 1 to 6 are thrown. The score in the
distribution exceeds 7. (C) step. Three of the bottles have silver tops and the
2. Wh.en. a certain type of cell is subjected to game IS the positive difference between the
fourth has a gold top. A thirsty blue-tit attempts
radt~t~on,_ the cell may die, survive as a single cell numbers showing uppermost on the two dice.
15. A gambling machine works in the following way. to remove the tops from these bottles. The or dtvtde mto two cells with probabilities ! ! ! (a) Tabulate the probability distribution for the
The player inserts a penny into one of five slots, probability distribution of X, the number of respectively. 2• 3• 6 score.
which are coloured Blue, Red, Orange, Yellow silver tops removed by the blue-tit, is the same
and Green corresponding to five coloured light
Tw? c.ells are independently subjected to {b) Calculate the expected value of the score.
each day and is given by radtatwn. The random variable X represents the {c) State the probability that the score is less
bulbs. The player can choose whichever coloured
slot he likes. After the penny has been inserted
P(X~O)~f,, P(X~1)~f',. total ~umber of cells in existence after this than the expected value. {NEAB)
expenment.
one of the five bulbs lights up. If the bulb lit up is P(X ~ 2) ~ fs, P(X ~ 3) ~ fs·
the same colour as the slot selected by the player, The blue-tit finds the gold top particularly
then the player wins and receives from the attractive, and the probability that this top is
machine R pennies, where removed is i, independent of the number of Mixed test 48
P(R ~ 2) ~ t, P(R ~ 4) ~} silver tops removed. Determine the expectation
and variance of 1. The discrete random variable X has the A player pays £20 for 30 games and is paid £kX
P(R ~ 6) ~ ,\,, and (a) the number of silver tops removed in a day, probability function shown in the table below. for each value of X he obtains.
P(R ~ 8) ~ P(R ~ 10) ~ ,\, (b) the number of gold tops removed in a day, (c) Calcul~te the expected profit or loss for 30
(c) the total number of tops (silver and gold) 1 games tf k = 0.1.
If the colour of the bulb lit up and the slot X 2 3 4 5
selected are not the same, the player receives removed in seven days. {d) Calculate the value of k for which the game
nothing from the machine. In either case the Find also the probability distribution of the total P(X~x) 0.2 0.3 0.3 0.1 0.1 would be fair. (C)
player does not get back the penny that he number of tops (silver and gold) removed in a
inserted. Assuming that each of the colours is day. (C) Find 3. An unbiased four-sided die has faces numbered
equally likely to light up, and that the machine (a) P(2 <X~ 4), 1, 2, 3 and 6. The die and a fair coin are tossed
18. A player throws a die whose faces are numbered (b) F(3.7), together. The random variable R denotes the
selects the bulbs at random, determine
1 to 6 inclusive. If the player obtains a six he (c) E(X), number on the hidden face of the die. If the coin
{a) the probability that the player receives
throws the die a second time, and in this case his (d) Var(X), s,
shows head_s, the score recorded, is equal to
nothing from the machine,
(b) the expected value of the amount gained by score is the sum of 6 and the second number; (e) E(X 2 + 4X- 3). (l,) 2R, otherwtse S = R.
otherwise his score is the number obtained. The (a) Tabulate the probability distribution for S
the player from a single try,
(c) the variance of the amount gained by the player has no more than two throws.
Let X be the random variable denoting the
2. A box contains six discs, of which two are (b) Calculate the expected value of s. ·
player from a single try. (C) labelled 2, three are labelled 3 and one is (c) Calculate the variance of S. (NEAB)
player's score. Write down the probability labelled 6. A game consists of a player drawing
distribution of X, and determine the mean of X.
16. (a) A regular customer at a small clothes shop Show that the probability that the sum of two
two discs simultaneously from the box. The sum
observes that the number of customers, X, of the .numbers on th.e.two.discs is denoted by X.
in the shop when she enters has the successive scores is 8 or more is jt. {a) Fmd the probabthty dtstribution of X
following probability distribution. Determine the probability that the first of two (b) Calculate E{X), E{X 2 ) and the varian~e
successive scores is 7 or more, given that their of X.
sum is 8 or more. (C)
Number of
0 1 2 3 4
customers, x

Probability
0.15 0.34 0.27 0.14 0.10
p(x)
Find the mean and standard deviation of X.
Example 5.1
The discrete variable X is such that P(X = x ) -_ c for X= 20, 30, 45, 50. Find
(a) the probability distribution of X
(b) /l, the expectation of X '
(c) P(X <!<), '
(d) a, the standard deviation of X.

Solution 5.1
(a)
20 30 45 50
Special discrete probability distributions I :(X-x) c c c c

2: P(X =x) = 1
.. 4c= 1
In this chapter you will learn c = 0.25
" about the conditions needed to model a situation for a discrete variable using P(X = x) = 0.25 for x = 20, 30, 45, 50

- a uniform distribution NOTE: There are four values each of h' h .


for r = 1, 2 3 4 The distr1'b t'1. . :'£ 1C 1S equally likely to occur and P(X = x) = 1 = 0 25
- a geometric distribution, X- Geo (p) ' ' · u on 1s unt orm. r 4 ·
- a binomial distribution, X- B(n, p)
(b) ll = E(X) = 2: xP(X = x)
- a Poisson distribution, X- Po (!.)
=20x0.25+30x0~+~x0.~+ 5 0 025
" how to calculate probabilities for these distributions and also the mean and variance = 36.25 X •

" about the use of the Poisson distribution as an approximation to the binomial distribution
(c) P(X <Ill= P(X < 36.25)
® about the distribution of the sum of two or more independent Poisson variables
= P(X = 20) + P(X = 30)
= 0.25 + 0.25
=0.5
THE UNIFORM DISTRIBUTION (d) E(X 2 ) = 2: x 2 P(X = x)
Throw an ordinary die. The probability distribution of X, tbe number on the die, is shown in = 0.25(20 2 + 30 2 + 45' +SO')
= 1456.25
tbe table and illustrated by the vertical line graph. Var(X) = E(X 2 ) -ll'

I
= 1456.25- 36.25 2
= 142.1875
1 2 3 4 5 6 a=~142.1875 = 11.9(3 s.f.).
X
1
1 1 6 123456x
6 6

P(X = x) = i for x = 1, 2, 3, 4, 5, 6
THE GEOMETRIC DISTRIBUTION
This is an example of a discrete uniform distribution.
Plastic models of animals are given away in ackets of b
packet contains a model of a rabbit 1·s 0 .1 . Cponsr'der t h e reakfast cereal.
prob b'l't d' The
'b probability
. that a
num ber o f packets you open unti'l you get a ra bb'1t. . a 11 y 1stn utwn of X ' the
Conditions for a uniform model
For a situation to be described using a discrete uniform model, P(X = 1) = P(first packet contains a rabbit)= 0 1
~(~: 2): P(first doesn:t, second packet does)·= 0.9 x 0.1 = 0.09
• the discrete random variable X is defined over the set of n distinct values x 1, x 2 , ... , x 11 ( 3)- P(frrst doesn t, second doesn't, third packet does)= 0.9 x 0.9 x 0.1 = 0.081
• each value is equally likely to occur and
1
P(X = x) =- for r = 1, 2, ... , n
' n
T
I

The mode of the geometric distribution


Similarly
3
P(X = 4) = 0.9 X 0.9 X 0.9 X 0.1 = (0.9) X 0.1 From the diagrams, you can see tbat the mode of an . . . . .
that for any value of P one atte t . h ]' y geometnc dlstnbutlon IS 1. This means
P(X = 5) = (0.9) 4 X 0.1 . , mp IS t e most Ikely b f
success. This is quite a surprising result. nrun er o attempts to obtain the first
P(X = 6) = (0.9) 5 X 0.1
and so on. A geometric model is being used in this example. P(X = 1) =P
P(X =2) =qp
Since 0 < q < 1, qp < p.
Conditions for a geometric model
Also P(X = 3) = q'p < qp < p and so on.
For a situation to be described using a geometric model, For example, if X_ Geo(0.3)
• independent trials are carried out, P(X = 1) =0.3
e the outcome of each trial is deemed either a success or a failure,
" tbe probability, p, of a successful outcome is the same for each trial. P(X = 2) = 0. 7 X 0.3 = 0.21 < 0.3
P(X = 3) = 0.7' X 0.3 = 0.147 < 0.21 < 0.3
The discrete random variable, X, is tbe number of trials needed to obtain tbe first successful
outcome.
If the above conditions are satisfied, X is said to follow a geometric distribution. This is Example 5.2
written Jack is playing a board game in which he needs to thro . . .
to start the game. Find tbe probability tbat ws a SIX wltb an ordmary die in order
X- Geo(p)
The probability of success, p, is all that is needed to describe the distribution completely. It is (a) exactly four attempts are needed to obt am . a s1x
.
b) 1
known as tbe parameter of the distribution. ( at east two attempts are needed '
(c)
d) he
h is successful
d in throwing a si; in tbr f
ee or ewer attempts
Writing P(failure) as q, where q = 1 - p: ( e nee s more than three attempts to obtain a six. '
H X,_~
\vbcrc
r) =
I Solution 5.2
X is the number of attempts u t d . I d'
p P(six) = so using a geometriE moo~~l ~~;hup ~nf
t, th~ ~rst occurrence of a six.
so that P\X - '' q-,, X- Geo(!).
-c 2) qp
·n
) )
so on. (a) P(X=4)=q'p
= (~) 3 X Ctl
NOTE: = 0.096 (2 s.f.)
" X cannot take the value 0, (b) P(X;,2)=1-P(X=1)
" the number of trials could be infinite, although this is unlikely in practice!
=1-p
Here are some diagrammatic illustrations of geometric distributions: =1-1
X- Geo{0.5)
X- Geo(0.8) = ~ 6
X- Geo{0.3)
0.8 (c) P(X.;; 3) = P(X = 1) + P(X = 2) + P(X = 3)
0.8 "
""
0.8
" II
II

"
Q
= p + qp + q2p
1 1 (' 2
X
"
~
0.6
5
=,+,x;;+ ,) x,1
0: 0.6
0.6 = 0.42 (2 s.f.)
0.4 Alternatively,
0.4
0.4
P(X <;; 3): P(success at some trial in the first three trials)
0.2 - 1 - P(no success m firSt three trials)
0.2
0.2 = 1-q'
0
=1-(~)'
X
0 X 0 1 2 3 4 = 0.42 (2 s.f.)
0 X 0 1 2 3 4 5 6
1 2 3 4 5 6 7
0
0.92",;;; 0.1
n log 0.92::;:;; log 0.1 T:-tki:1g iog~ to h~tS(' !U o!- both sid,_,~
(d) (X>3)=1-P(X<3)
=1-(1-q 3 ) log0.1
n;;,~~=
= q3 log0.92 lng (l,':l2 is r1cr;:ui' t', ad dividi:1ghy <1 nq;:ttivt· (.j'J:mtlty r~·vc·r~vs dh: inc'lll:-tliry.
= (t)J 1.e. n;;, 27.6 ...
= 0.58 (2 s.f.) The smallest value of n is 28, as before.

These two results, illustrated above, are very useful:


EXPECTATION AND VARIANCE OF THE GEOMETRIC DISTRIBUTION
lf X ~> and q >c l c" i"
<x) :- _:. 'l l q
:>xi
fJ

Example 5.3 f · 08 I quality


Example 5.4
. . h obability that an item is au 1ty lS 0 • • n a .
On a particular productwn lme t e p~ f m the production line. It is assumed that quahty When I make a telephone call to an office, the probability of not getting through is 0.45. If I
control test, items are selected at ran om .ro do not get through, then I try again later. Let X denote the number of attempts I have to make
of an item is independent of that of other !terns. in order to get through. Stating any necessary assumptions, identify the probability
distribution of X. Hence, calculate
(a) Find the probability that the first faulty item
(a) P(X;;, 4),
(i) does not occur in the first six selected, (b) E(X) and Var(X). (C)
(") occurs in fewer than five selectwns.
ll f . 1. faulty item on or before the nth attempt. Solution 5.4
(b) There is to be at least a 90% chance o p!ccmg a "
What is the smallest number n? X is the number of attempts I have to make in order to get through. Assuming that the
attempts are independent and the probability of getting through is the same for each attempt,
then X follows a geometric distribution with p = 0.55, q = 0.45.
Solution 5.3 ( (a) P(X);;, 4) = P(X > 3) = q 3 = (0.45) 3 = 0.091 (2 s.f.)
X is the number of items picked until a fa~6 ~~e~ ~e~~~~~:o 8 ). 1 1
(b) E(X) = - = - = 1.8 (2 s.f.)
Using a geometric model With p = 0.08, q- . ' p 0.55
.) P(X 6) q'- 0 92 = 0.61 (2 s.f.) 6 q 0.45
(a) (1 . > ): P(X- ".4 ) = 1- q• = 1-0.92 4 = 0.28 (2 s.f.) Var(X) = - , = - -2 = 1.5 (2 s.f.)
(H) P(X < 5 - ~ p 0.55
(b) You need to find n such that P(X < n);;, 0 ·9
But P(X.;; n) = 1 - q" Example 5.5
=1-0.92"
Identical independent trials of an experiment are carried out. The probability of a successful
So 1 - 0.92";;, 0.9
0.1;;, 0.92" outcome is p. On average, five trials are required until a successful outcome occurs.
0.92",;;; 0.1 (a) Find the value of p.
(b) Find the probability that the first successful outcome occurs on the fifth trial.
By trial and improvement,
0.92 25 = 0.124 ... > 0.1 Solution 5.5
0.92" = 0.114 ... > 0.1
X is the number of trials up to and including the first success.
0.9227 = 0.105 ... > 0.1
X- Geo (p) and E(X) = 5.
0.92 28 = 0.096 ... < 0.1
1
The smallest value of n is 28. f d (a) E(X) = -
p
If you have studied logarithms in Pure Mathematics, you could use them to m n: 1
.. 5=-
p
1
p= -s= 0.2
C()lJF;Si:_ i\ /\-\ \ \/L_\
276

6. A random number machine generates random 13. In a computer game, the probability that the
digits between 0 and 9. Each of the ten digits is player hits the target is 0.4 for each attempt and
(b) X- Geo(0.2), i.e. p = 0.2, q = 0.8 equally likely to be generated. the result of each attempt is independent of all
4
P(X = 5) = q P (a) X is the value of the digit generated. others. Find
= 0.8 4 X 0.2 Find (a) the probability that he hits the target for the
0.08192 (i) P(X<6), first time on the fourth attempt,
(ii) P(X ~ 7), (b) the mean number of attempts needed to hit
(iii) E(X), the target,
{iv) the standard deviation of X. (c) the standard deviation of the number of
(b) X is the number of digits generated to the attempts,
Example 5.6 5 F d P(X = 1). first occurrence of a 5. (d) the most likely number of attempts to hit the
X_ Geo(p) and it is known that P(X = 2) = 0. 21 and P< O. · m Find target,
(i) the probability that the first occurrence (e) the probability that he takes more than
of the digit 5 is at the seventh number seven attempts to hit the target.
generated,
Solution 5.6 {ii) the most likely number of digits 14. Alice runs a stall at a fete in which each player is
generated to obtain a 5, guaranteed to win £10. Players pay a certain
P(X = 2)= qp where q = 1-p amount each time they throw a die and must
(iii) the mean number of digits generated to
0.21=(1-p)xp obtain a 5. keep throwing the die until a four occurs. When
so 2 a four is obtained, Alice gives the player £10.
0.21 = p- P
7. X~ Geo(0.5). Find On average Alice expects to make a profit of 50p
p2- p + 0.21 = 0
(a) the mode, per game. How much does she charge per throw?
(p- 0.3)(p- 0.7) = 0 (b) the mean of X,
p = 0.3 or p = 0.7 (c) the standard deviation of X. 15. During the winter in Glen Shee, the probability
Since p < 0.5, P = 0.3 that snow will fall on any given day is 0.1.
8. A darts player practises throwing a dart at the Taking 1 November as the first day of winter
P(X 1) p 0.3 hull's eye on a dart board. Independently for and assuming independence from day to day,
each throw, her probability of hitting the hull's find to two significant figures, the probability
eye is 0.2. Let X be the number of throws she that the first snow of winter will fall in Glen Shee
makes, up to and including her first success. on the last day of November (30th).
{a) Find the probability that she is successful for Given that no snow has fallen at Glen Shee
the first time on the third throw. during the whole of November, a teacher decides
{b) Write down the distribution of X and give not to wait any longer to book a skiing holiday.

un d I the name of the distribution.


(c) Write down the probability that she will
The teacher decides to book for the earliest date
for which the probability that snow will have
se (d) the expected number of tosses until a head is
have at least three failures before her first fallen on or before that date is at least 0.9. Find
the date of the booking. (L)
success. (L)
1. The probability distr!bution for the random obtained, . . ·
variable X is shown m the table. (e) the expected number of tosses untt1 a tat1 ts 9. The random variable X follows the geometric 16. In· many board games it is necessary to 'throw a
obtained. distribution with probability p = 0.3. six with an ordinary" die' before a player can start
8 9 10 the game. Write down, as a fraction, the
6 7
X
A sixth former is waiting for a bus. to take him to (a) Write down the probability P(X = 4). probability of a playe.r
' town. He passes the time by count~ng the
a a a a 4 (b) Careful!( explain why P(X ~ n)
a o.r- (a) starting on his first attempt,
number of buses, up to and inclu~m.g the one is 0.3.
(b) not starting until his third attempt,
that he wants, that come along hts stde ofthe (c) Describe in words a situation that has
Find probability 0.7"- 1. (0) {c) requiring more than three attempts before
(a) the value of a, road. . h 'd of the starting.
If 30% of the buses travellmg on t at sJ e What is
(b) the mean of X, h h 10. X- Geo(p) and the probability that the first
(c) the probability that X is the smaller t ant e road go to town, what is (d) the most common number of throws
success is obtained on the second attempt is
mean. (a) the most likely count he makes to the arrival 0.1275. If p > 0.5, find P(X > 2). required to obtain a six,
of one that will take him into town, (e) the mean number of throws required to
2. The random variable X is Geo(0.35). Calculate (b) the probability that he wil\ count, at mos~,O) 11. The probability that a telephone box is occupied obtain a six?
(b) P(X>4), four buses? is 0.2. Find, to two significant figures, the Prove that the probability of a player requiring
(a) P(X=4), more than n attempts before starting is(%)".
(d) E(X). probability that a person wishing to make a
(c) P(X <; 3), 5. During January the probability that it will rain telephone call will find a telephone box which is (f) What is the smallest value of n if there is to
coin is biased so that the probability of on any given day is 0.55. . . not occupied only at the sixth box tried. (L) be at least a 95% chance of starting on or
3. A 6 Stating a necessary assumptiOn, ftnd the before the nth attempt? (0)
obtaining a head is 0. · 12. An unbiased coin is tossed repeatedly until a tail
The random variable X i~ the numb,~r of tosses probability that appears. Find the expected number of tosses.
up to and including the ftrst head. hnd (a) the first rainy day in January is on
(a) P(X <; 4), 5 January,
(b) it docs not rain before 8 January.
(b) P(X > 5), I I d
(c) the most likely number of tosses untt a lea
is obtained,
279

8
You should find that C2 = 28. So there are 28 different arrangements of two who have blood
THE BINOMIAL DISTRIBUTION type B and six who do not have blood type B.
l d B If three people are selected at Therefore P(exactly 2 have type B)= 28 x 0.9 6 x 0.1 2 = 0.15 (2 s.f.)
In a particular population, 10% of people havbebbloohty pte x~ctly two of them have blood type B?
l . hatls the pro a I lty t a e Using a similar argument, you could find the probability that exactly two have blood type B
random from the popu anon, w h h bl od type of one person is
d t ndom assume t at t e o ') 0 9 in a randomly selected group of 12 people. In this case, ten will not have type B and
Since the people are se lecte a ra , ) - P(B) = 0.1, P(not type B)= P(B = ..
independent of that of another so P(type B - . P(exactly 2 have type B) = 12
c2 X 0.9 10 X 0.1 2 = 0.23 (2 s.f.)
To calculate the probability you could use a tree diagram.
The above three situations have been described using a binomial model.
~8

~
''·~8--------------- P(B B B') = 0.1 X 0.1 X 0.9 = 0.009* Conditions for a binomial model
'-'·'-' B' , , *
8----------- r.'.' B
P(B, B', B)= 0.1 X 0.9 X 0.1 = 0.009
(~B'~ For a situation to be described using a binomial model,

~8' e a finite number, n, trials are carried out,


P(B', B, B)= 0.9 X 0.1 X 0.1 = 0.009* • the trials are independent,
~B e the outcome of each trial is deemed either a success or a failure,
" the probability, p, of a successful outcome is the same for each trial.
n ') ~
r ~8--------------- B'
·:1.'J The discrete random variable, X, is the number of successful outcomes in n trials.

B'~B'~B If the above conditions are satisfied, X is said to follow a binomial distribution. This is written
X- B(n,p) or X- Bin(n,p)
~B'
NOTE: The number of trials, n, and the probability of success, p, are both needed to describe
Third
First Second
person
the distribution completely. They are known as the parameters of the binomial distribution.
person person
Writing P(failure) as q where q = 1 - p:
P(exactly two type B)= l' (B, B , B') + P(B B' , B)+ P(B', B, B)
= 3 X 0.9 X 0.1
2 ,
I the • '
1n n tn;:w.;
J •

= 0.027 b bTt that


Now consider the situation when eight peopledare se~~ted. What is the pro a II y
· h ple will have bloo type · . h For the three situations described above:
exactly two o f t h e elg t peo l. ted It is possible to fmd t e
b . ld become very camp lCa . B When 3 people are selected, n = 3, p = 0.1, q = 0.9.
You could extend your tree, ut It wou .h B and six who do not have blood type .
robability as follows since you want two Wit type X is the number of successful outcomes in 3 trials, so X- B(3, 0.1).
P
P(choose 6 B' then 2 B)= P(B,,• B' ' B'.;_ B' ' B' ' B' ' B' B) P(X = 2) = 3 C2 q1 p2
=0.9 X 0.1 = 3 X 0.9 X 0.1 2
nts of this outcome, = 0.072
But there are several arrangeme . B B' B' B' B B' B' B'
lB 'B'BB'B'BB'Fm,,, , , , , ,
fm examp e , , • ' ' ' ~ 6 0 1' (8) When 8 people are selected, n = 8, p = 0.1, q = 0.9.
each with a probability of occurnng of0.9 x . ; sometimes written s C2 or X is the number of successful outcomes in 8 trials, so X- B(8, 0.1).
The number of different arrangements IS given by c2, 2
P(X = 2) = 8 C2 q 6 p 2
(see page 215). h C 1 = 28 X 0.9 6 X 0.1 2
This can be found directly on your calculator using t e " ' cey:
= 0.15 (2 s.f.)
(Youmayhavetopress\SHIFT\)[]] \,C,I rn 0 I

1 nc , n'. so 8 8.
C2::::--
otherwise use the formu a r!(n _ r)! 2! 6!

On calculator:
[]]OJEJ[IIOJEl~OJB
Solution 5.8

When 12 people are selected, n = 12, p = 0.1, q = 0.9. X- B(5, p) and X takes the values 0, 1, 2, 3, 4, 5.
X is the number of successful outcomes in 12 trials, so X- B(12, 0.1). P(X = 0) = 5 C q5po = q5
0
(-,(,
P(X = 1) = 5c,q•p' = 5q.p'
P(X=2)= 12C,q'op' 2 P(X = 2) = 5 c 2 q3p' = 10q'p'
=66x0.9 10 x0.1
P(X = 3) = 5 C3q'p' = 10q'p'
=0.23 (2s.f.) 5C q'p• = 5q'p•
P(X = 4) = 4
P(X = 5) = 5 c 5qop5 = p
5
Example 5.7 ~!uti(';,; dl3t ~he- jlO\Wl'S ul,l; :\;\\_-\ C,l ~lc·,'rl
', '·'.'•'• tu 5 r-~'lch :imc.
At Sellitall Supermarket, 60% of customers pay by credit card. Find the probability that in a
randomly selected sample of ten customers, s terms5 q 5' 5q•p' ' ... , P5 are the terms in the b'morma· l expanswn
The . of (q + p)5
o q + 5q•p + 10q3p'
. + 10q'p3 + 5 q!p• + p5 =(q+p)5 .
(a) exactly two pay by credit card,
(b) more than seven pay by credit card. P(_X 0} F(X 1,\ P(X 2} J'(X---3} l-'(X 5}

But (q + P) 5 = 1, since q + p = 1,
Solution 5.7
X is the number of customers in a sample of ten who pay by credit card. :. P(X = 0) + P(X = 1) + ... + P(X = 5) = 1.
Consider 'paying by credit card' as success, p = 0.6, q = 1- p = 0.4. This confirms th a t th e tota1sum of the prob a b'l' . IS
11t1es . 1.
Assuming independence, a binomial model can be used, with n = 10,
so X- B(10, 0.6).
~~"-.~,,=~.
NOTE: Some- vertical
.
line graphs .illustrating the b'momial
.• . .
. dtstribution
. are given o npage 28 9.
·~· ...•
~~·-' ·-=
(a) P(X = 2) = 10 C q8 p2 No:ce ,:,, ,.,,: "id '" •·
2 2
= 45 X 0.4 8 X 0.6
Example 5.9
= 0.011 (2 s.f.)
(b) P(X > 7) = P(X = 8) + P(X = 9) + P(X = 10)
The = 3),variable X is distributed B( 7, 0.2). Fmd,
random
(a) P(X . correct to three decimal places,
= toc,q'p' + toc,q'p' + toc10 qopto
= 45 X 0.4 2 X
9
0.6 8 + 10 X 0.4
1
X 0.6 + 0.6
10
I (b) P(1<X<:4),
= 0.17 (2 s.f.) (c) P(X> 1).

It is useful to note that, for any binomial distribution, Solution 5.9


n'
P(X =0) = "C q"p 0 but p0 = 1 and "C 0 = P = 0.2, q = 1 - P = 0.8, n = 7
0
0 ,. -
·n.1 = 1, smce 0! = 1
(a) P(X = 3) = 7 c 3q•p3
so 0) = 35 x o.8 4 x 0.23
n' = 0.115 (3 d.p.)
Also P(X = n) = "Cn q0 p" but q0 = 1 and "C n =-·-=
n!O! 1
(b) P(1<X<:4)=P(X=2)+P(X=3)+P(X=4)
= 7c,q5p'+ 7c,q•p3+ 7c q3p•
3
There is a link between the probabilities in the binomial distribution and the terms in the =21x0.85x02'+35xos• 0 23
= 0.419 (3 d.p.) • X · + 35 X 0.8 X 0.24
binomial expansion of (q + p)" which you may have studied in Pure Mathematics. This is
illustrated in the following example.

Example 5.8
Five independent trials of an experiment are carried out. The probability of a successful
outcome is p and the probability of failure is 1- p = q.
Write out the probability distribution of X, where X is the number of successful outcomes in
five trials. Comment on your answer.
From the calculator, you find that log 0.9 ~ -0.045 ... , so divide both sides by log 0.9 and
(c) P(X>1) ~P(X~2)+P(X~3)+···+P(X~ 7 l reverse the inequality (as you are dividing by a negative quantity).
Rather than calculate all these terms, it is much quicker to find log 0.05
n>
P(X> 1) ~ 1-P(X< 1) log 0.9
~ 1- (P(X ~ 0) + P(X ~ 1)) n > 28.4 ...
~1-(q 7 + 7 C 1 q 6 P) The least value of n is 29, as before.
6
~ 1- (0.8 7 + 7 X (0.8) X 0.7)
~ 0.423 (3 d.p.)

Using cumulative binomial probability tables


Example 5.10 . . If you have access to these tables, you may wish to use them to calculate probabilities.
fThe robability that a pen 1s faulty 1s 0.1.
A box contains a large number 0 pens · Pb th n 95 % certain of picking at least one The tables are printed on page 645. They give P(X < r) for various values of nand p. Here is
How many pens would you need to se1ect to e more a an extract for B(7, 0.2), the distribution used in Example 5.9.
faulty one?
'' (),)

() 0.2097 <---
Solution 5.10 .
1 X· h mber of faulty pens m n. 0.5767 <---
Let n be the number of pens you needbto see~t. ~slt ~ ~~(n 0.1) with P ~ 0.1, q ~ 0.9. 0.8520 <--- rr''
Assuming independence and using a momta mo e ' ' ' r-(x.
0.9667 <---

You want P(X>1)>0.95 0.9953 <--- -l'

But P(X> 1) ~ 1-P(X~O) 0.9996 <--- i-'(X


~ 1- 0.9" 1.0000 <--- !:'1

So 1- 0.9" > 0.95 1.0000 <---


1- 0.95 > 0.9"
0.05 > 0.9" Using the tables to work out the probabilities required in Example 5.9:
i.e. 0.9" < 0.05 I (a) P(X~3) ~P(X<3)-P(X<2)
~ 0.9667- 0.8520
By trial and improvement
~ 0.115 (3 d.p.)
0.9zs ~ 0.071 ... (greater than 0.05)
(b) P(1<X<4)~P(X~2)+P(X~3)+P(X~4)
0.930 ~ 0.042 ... (less than 0.05)
~ P(X < 4)- P(X < 1)
So the value of n lies between 25 and 30. ~ 0.9953- 0.5767
~ 0.419 (3 d.p.)
0.926 ~ 0.0646 .. . (greater than 0.05)
0.9" ~ 0.058 ... (greater than 0.05) (c) P(X>1) ~1-P(X<l)

NOTE: On the calculator [0.9[ ~ 126 [ ~ (0 ·0646 ... ).


~
~
1-0.5767
0.423 (3 d.p.)
0 1
IO 9 ~ (0 0581 ) and so on The answers
~~ ~~tc~~::ea~~~t~::.:::l~~rd:e~:~;:~ouxare ~ultip~ing by a n~~ber betwe~n 0 and 1. Example 5.11
0.9zs ~ 0.0523 ... (greater than 0.05) The random variable X is distributed B(5, 0.3). Giving your answers to three decimal places,
0.929 ~ 0.0471 ... (less than 0.05) use the extract from the cumulative binomial probability tables to find
you need to select at least 29 pens. (a) P(X<4) 0 ..- ;,
(b) P(X~2)
Alternatively, using logarithms: !j 0.1681
(c) P(X < 3) 0.5282
0.9" < 0.05 (d) P(X> 1) 0.8369
Taking logs to base 10 of both sides, (e) P(X;;. 3) 0.9692
4 0.9976
n log 0.9 <log 0.05 1.0000
Solution 5.12
Solution 5.11 Using n = 8,
1 1
(a) P(X.;; 4) = 0.9976 = 0.998 (3 d.p~ 8369- 0 5282 = 0.3087 = 0.309 (3 d.p.) (a) P(X;:., 31 p = 0.6) = P(X.;; 51 p = 0.4) = 0.9502
(b) P(X=2)=P(X<:2)-P(X<:1)= . . T T
c) P(X < 3 ) = P(X.;; 2) = 0.8369 = 0.837 (3 d.p.)
( x"
_ 1 _ o 5282 = o.4718 = 0.472 (3 d.p.) (b) P(X <: 21 p = 0.6) = P(X;:., 61 p = 0.4)
(d) P(X>1)=1- P( ~ 11 - . (3d )
(e) P(X;;, 3) = 1- P(X.;; 2) = 1- 0.8369 = 0.1631 = 0.163 .p. = 1- P(X dIp= 0.4)
= 1-0.9502
= 0.0498
Using symmetry properties to read binomial tables - (c) P(X =51 p = 0.6) = P(X = 31 p = 0.4)
. 've binomial tables, values of pare given only up toP- 0.5. = P(X.;; 3 I p = 0.4) - P(X.;; 21 p = 0.4)
In some verstons of the cumulatr d the symmetry properties of the = 0.5941 - 0.3154
1 f 0 5 you nee to use = 0.2787
To use the tables for va ues o p > . '
binomial distribution. babilit distributions of B(5, 0.3) and B(5, 0.7).
This is illustrated in the sketches of the proh t 0 = 1 - 0.3.
In both these distributions n = 5 and note t a .
J NOTE:
X~ 8(5, 0.71 0 It is sometimes quicker to use the cumulative tables, but you should make sure that you
X~ 8(5,0.3) y know how to calculate the probabilities directly.
y 0.4 " The tables are not available for all possible values of p.
0.4 " The values given in the tables should agree with the calculated values to three decimal places.
0.3
0.3
se 5b binomial cl
0.2
Give answers to three significant figures. 6. The probability that it will rain on any given day
0.2
in September is 0.3. Stating any assumption
0.1 1. 30% of pupils in a school travel to school by made, calculate the probability that in a given

I
0.1
bus. week in September, it will rain on
From a sample of ten pupils chosen at random,
I find the probability that
(a) exactly two days,
I 0 1 2 3 4 5 ' (b) at least two days,
(a) only three travel by bus, (c) at most two days,
2 3 4 5 '
0 (b) less than half travel by bus. (d) exactly three days that are consecutive.
you can see that 2. In a survey on washing powder, it is found that 7. A fair coin is tossed six times. Find the
P(X = 0 IP= 0.3) = 0.17=P(X = 5IP =0.7) the probability that a shopper chooses Soapysuds probability of throwing at least four heads.
P(X = 11 p = 0.3) = 0.36 = P(X = 41 p = 0.7) is 0.25. Find the probability that in a random
8. Assuming that a couple are equally likely to
P(X =2IP = 0.3) =0.31 =P(X = 3IP = 0. 7 )
sample of nine shoppers
(a) exactly three choose Soapysuds, produce a boy or a girl, find the probability that
(b) more than seven choose Soapysuds. in a family of five children there are more boys
and so on. = O. 71
Also P(X dl P = 0.3) = 0 · 84 = P(X;;, 3 1p
than girls.
3. A bag contains counters of which 40% are red
and the rest yellow. A counter is taken from the 9. X is B(4, p) and P(X ~ 4) ~ 0.0256.
In general bag, its colour noted and then replaced. This is Find P(X ~ 2).
n r\.X-- .s /! •• 0.4
!t performed eight times in all.
,:; r\ X~­ r\ X-~ _l " fJ)}
0.0168
Calculate the probability that 10. Charlie finds that when she takes a cutting from
1 i . I) a particular plant, the probability that it roots
o(,ft"·F [X·-· (a) exactly three will be red,
) r\X ~-· p)) 0.1064 successfully is!.
{b) at least one will be red,
2 0.3154 {c) more than four will be yellow. (a) She takes nine cuttings. Find the probability
0.5941 that
.l
Example 5.12 0.8263
4. The random variable X is B(6, 0.42). Find (i) more than five cuttings root
.j
6 successfully,
The random variable X is B( 8 , 0 · 1· . f X B(8 0 4) to find 5 0.9502 (a) P(X ~ 6), (b) P(X ~ 4), (c) P(X <: 2).
(ii) at least three cuttings root successfully.
Use the extract of the cumulative binomtal tables or - ' , 0.9915 5. An unbiased die is thrown seven times. Find the (b) Find the number of cuttings that she should
6
0.9993 probability of throwing at least 5 sixes. take in order to be 99% certain that at least
(a) P(X:>3) one cutting roots successfully.
s 1.0000
(b) P(X <: 2)
(c) P(X = 5)
Solution 5.13
16. 1% of light bulbs in a box are faulty. Using a
11. An experiment consists of taking seven shots at a binomial model, find the largest sample size X is B(4, 0.8) son~ 4 and p ~ 0.8.
target and counting the number of hits. which can be taken if it is required that the
The probability of hitting the target with a single probability that there are no faulty bulbs in the P(X ~ 0) ~ 0.2 4 ~ 0.0016
shot is 0.6. Using a binomial model, find the
probability that in seven attempts the target is hit
sample is greater than 0.5. P(X ~ 1) ~ 4 X 0.2 3 X 0.8 ~ 0.0256
Comment on the use of the binomial model in
at most twice. P(X ~ 2) ~ 4 C2 X 0.2 2 X 0.8 2 ~ 0.1536
this situation.
Give a reason why the binomial model may not P(X ~ 3) ~ 4 C, x 0.2 x o.8 3 ~ oAo96
be a good one to use in this situation. 17. In a test there are ten multiple choice questions. P(X ~ 4) ~ 0.8 4 ~ 0.4096
For each question there is a choice of four
12. In the mass production of bolts it is found that answers, only one of which is correct. A student The probability distribution for X is
5% arc defective. Bolts are selected at random guesses each of the answers.
and put into packets of ten.
A packet is selected at random. Find the
(a) Find the probability that he gets more than X 0 1 2 3 4
probability that it contains seven correct.
He needs to obtain over half marks to pass and P(X x) 0.0016 0.0256 0.1536 0.4096 0.4096
(a) three defective bolts,
(b) less than three defective bolts. each question carries equal weight.
E(X) ~ LxP(X ~ x)
(b) Find the probability that he passes the test.
Two packets are selected at random. : ~.~ 0.0016 + 1 X 0.0256 + 2 X 0.1536 + 3 X 0.4096 + 4 X 0.40 96
(c) Find the probability that there arc no 18. X~ B(n, 0.3). Find the least possible value of n
defective bolts in either packet. such that P(X ;>1) ~ 0.8.
E(X 2 ) ~ Lx 2 P(X ~ x)
13. A coin is biased so that it is twice as likely to 19. Given that X~ B(7, 0.85) use the cumulative ~ 1 X 0.0256 + 4 X 0.1536 + 9 X 0.4096 + 16 X 0 4096
show heads as tails. The coin is tossed five times.
Calculate the probability that
binomial probability tables on page 646 to write ~ 10.88 .
out the probability distribution of X.
(a) exactly three heads are obtained, Var(X) ~ E(X 2 ) - E 2 (X)
(b) more than three are obtained. 20. The random variable X is B(n, 0.6) and
P(X < 1) ~ 0.0256. Find the value of n. ~ 10.88- 3.22
14. The random variable X can be modelled by a ~ 0.64
binomial distribution with n = 6 and p = 0.5. 21. For each of the experiments described below,
state, giving a reason, whether a binomial Now np ~ 8 x 0.4 ~ 3.2 E(X)~np
Construct the probability distribution and
illustrate it graphically. Comment on the distribution is appropriate. npq ~ 8 X 0.4 X 0.6 ~ 0.64 .. Var(X) ~ npq
Experiment 1: A bag contains black, white anc.l
distribution. red marbles that are selected one at a time, with
15. The probability that a target is hit is 0.3. find replacement. The colour of each marble is noted.
the least number of shots which should be fired if Experiment 2: This experiment is a repeat of
the probability that the target is hit at least once experiment 1 except that the bag contains black Example 5.14
is greater than 0.95. and white marbles only.
State any assumptions that you have made. Experiment 3: This experiment is a repeat of The probability that it will be a fine day is 0.4. Find the expected b ff d .
experiment 2 except that the marbles are not week and also the standard deviation. num er o me ays m a
replaced after each selection. (L)

Solution 5.14

is independent of the weather on other day~,


Let X be the number of fine days in a week Assumin h h
g t at t e weather on any particular day
EXPECTATION AND VARIANCE OF THE BINOMIAL DISTRIBUTION
X~ B(n, P) with n ~ 7 and p ~ 0.4

It can be shown that The expected number of fine days~ E(X)


~np
ll X~ B(n, p) ~7x0.4

"-~ np
and cc npq where q ~·· 1 p ~2.8

These results can be quoted and should be learnt. They are illustrated iu the following Standard deviation of ~ ;/Var(X)
example. ~~
~ ;/7 X 0.4 X 0.6
Example 5.13 ~ 1.3 days (2 s.f.)
The random variable X is B(4, 0.8). Construct the probability distribution for X and find the
expectation and variance. Verify that E(X) ~ np and Var(X) ~ npq.
(b) Using X- B(4, 0.65) calculate the probabilities of 0 1
these by 500 to obtain the theoret!'cal f requenctes.
. ' ' 2, 3 and 4 heads and multiply
Example 5.15
X is B(n, p) with mean 5 and standard deviation 2. Find the values of nand p.
Frequency
X P(X~x) (nearest integer)
Solution 5.15 0 0.35 4 ~ 0,015 ... 8
therefore np ~ 5 CD 1 4 X 0.35 3 X 0.65 ~ 0.111 ...
E(X) ~np, 56
therefore npq ~ 2 ~ 4
2 @ 2 6 X 0.35 2 X 0.65 2 ~ 0.310 ... 155
Var(X) ~ npq,
3 4 X 0.35 X 0.65 3 ~ 0.384 192
Substituting for np in equation@
5q ~4
4 0.65 4 ~ 0.178 ... 89
q ~ 0.8
p~l-q Total500
So p~0.2
These compare reasonably well with tbe original distribution
n x 0.2 ~ 5 A statistical test to compare the two sets of data ' the x' t es t , 1s
.' 1'11 ustrate d on page 571.
Substituting for p in equation CD
n~ 25

Diagrammatic representation of the binomial distribution


Fitting a theoretical distribution to practical data p p
nf'
PI
It is sometimes useful to compare experimental results with theoretical data as illustrated in
0.3 0.3
i:' !
tbe following example. "" ii!i!H• !
0.2 no
0.2
Example 5.16
A biased coin is tossed four times and the number of heads noted. The experiment is 0.1 0.1 0.1
:![J;j!l!i'il r: ill!•l:i\"Ii ii•' !l!:IU:! !Ill !Jilt
performed 500 times in all and the results are summarised in the table:
:a •r•mu''' "' HflfWllf'!: u:rm 'HI! I

2 3 4 Q 1 2 3 4 5 6 7 X 0 I 2 3 4 ' 0 1 2 3 4 5 6 '
Number of heads 0 1
151 200 87
12 50 p
Frequency
0.2
(a) From the experimental data, estimate the probability of obtaining a head when the coin is
0.1
tossed.
(b) Using a binomial distribution with the same mean, calculate tbe theoretical probabilities
0.1
of obtaining 0, 1, 2, 3 and 4 heads. 0 1 2 3 4 5 6 7 8 9 10 11 12(->-20) X
Probabilities too
0 1 2 3 4 5 6 7 8 gX small to illustrate.
Solution 5.16
(a) For the frequency distribution,
The mode of the binomial distribution
"Lfx 1300
mean~x~--~--~2.6
"Lf 500 i:e mode is the value of X that is most likely to occur.
om the probablhty distribution sketches above, it can be seen that
Let X be the number of heads in four tosses. Then X- B(4, p).
• when p. - 0 .5 an d n lS
· o dd ' there are t'wo modes
For a distribution with the same mean, 4p ~ 2.6 • otherwise the distribution has one mode. '
p ~ 0.65
An estimate of the probability that tbe coin shows heads is 0.65.
The mode
highest can be found
probability Th' by
. hcalculating '1' · and finding value of X with the
. all the pro b.a b!Illes
· IS IS owever very tedious· lt is us 11 1
pro abilities of values of X c1 t h ' ua Yon Y necessary to consider the
b ose o t e mean np.
T
10. Seeds are planted in rows of six and after 14
?ays the number of seeds which have germinated (a) Calculate, to two significant figures, the
Example 5.17 111 each of the 100 rows is noted. probability th~t, in any one sample, two
The probability that a student is awarded a distinction in the Mathematics examination is 0.05. The results are shown in the table: bolts or less will be faulty.
(b) Find the expected value and the variance of
In a randomly selected group of 50 students, what is the most likely number of students Number of seeds the number of bolts in a sample which will
germinating 0 1 2 3 4 5 6 not be faulty. (L Additional)
awarded a distinction?
Number of rows 2 1 2 10 30 35 20 14. An experiment consists of taking 12 shots at a
target and counting the number of hits.
Solution 5.17 Find the theoretical frequencies of 0, 1, ... , 6 When this e;cperiment was repeated a large
X is the number of students who are awarded a distinction in 50, so X- B(50, 0.05). seeds g~rminating in a row, using the associated number of ttmes the mean number of hits was
theorettcal binomial distribution. found to be 3. Calculate
E(X) ~ np ~50 x 0.05 ~ 2.5, so calculate the probabilities for values of X near 2.5. (a) the probability of hitting the target with a
11. Each day a bakery delivers the same number of single shot,
P(X ~ 1) =50 0.95 49 0.05 ~ 0.202 ... loaves to a certain shop which sells on average
X X (b) the standard deviation of the number of hits
98% of them. Assuming that the n~mber of '
P(X ~ 2) ~ 5°C, X 0.95 48
X 0.05 2 = 10.2611 .. . loaves sold per day has a binomial distribution
in an experiment. (C Additional)
with a standard deviation of 7, find the number
P(X ~ 3) ~ 5°C3 X 0.95 47 X 0.05 3 ~ 0.219 .. . 15. In an experiment a certain number of dice are
of loaves the shop would expect to sell per day. thrown and the number of sixes obtained is
From the list, you can see that the value of X with the highest probability is 2. IC Additional) recorded. The dice are all biased and the
12. In a large batch of items from a production line probability of obtaining a six with each individual
The most likely number of students awarded a distinction in a group of 50 is two. the probability that an item is faulty is p. die is fJ. In all there were 60 experiments and the
400 samples, each of size 5, are taken and the results are shown in the table.
number of faulty items in each batch is noted.
From the frequency distribution below estimate p
Number of sixes
and work out the expected frequencies of 0 '] 2
of binomial di on 3, 4, 5 faulty items per batch for a theoreti~al' ' obtained in an
Exercise 5c Expectation, variance binomial distribution having the same mean. experiment 0 2 3 4 >4
In a large number of experiments the standard Frequency 19 26 12 2 1 0
1. 10% of the articles from a certain production Number of
deviation of the number of sixes is 1.5.
line are defective. A sample of 25 articles is faulty items 0 1 2 3 4 5
Calculate the value of p and hence determine, to
taken. Find the expected number of defective Calculate the mean and the standard deviation of
two places of decimals, the probability that
items and the standard deviation. Frequency 297 90 10 2 1 0 these data.
exactly three sixes arc recorded during a
By comparing these answers with those expected
2. The probability that an apple picked at random particular experiment. (C)
for a binomial distribution, estimate
from a sack is bad is 0.15. 7. In a certain African village, 80% of the villagers J 3. On a~era.ge 20% of the bolts produced by a
machme m a factory arc faulty. Samples of ten (a) the number of dice thrown in each
(a) Find the standard deviation of the number are known to have a particular eye disorder. experiment,
Twelve people are waiting to see the nurse. bolts are to ?c selected at random each day.
of bad apples in a sample of 15 apples. Each bolt wt!l be selected and replaced in the set (b) the value of p. (C Additional)
(b) What is the most likely number of bad (a) What is the most likely number to have the of bolts which have been produced on that day.
apples in a sample of 30 apples? eye disorder?
(b) Find the probability that fewer than half
3. The random variable X is B(n, 0.3) and have the eye disorder.
E(X} = 2.4. Find nand the standard deviation
of X. 8. In a bag there arc six red counters, eight yellow
counters and six green counters. An experiment
4. In a group of people the expected number who consists of taking a counter at random from the THE POISSON DISTRIBUTION
wear glasses is two and the variance is 1.6. bag, noting its colour and then replacing it in the
Find the probability that bag. This procedure is carried out ten times in Consider these randmn variables
all. Find
(a) a person chosen at random from the group
wears glasses, (a) the expected number of red counters drawn, * the number of emergency calls received by an ambulance control in an hour
(b) six people in the group wear glasses. (b) the most likely number of green counters e the nun1ber of vehicles approaching a motorway toll bridge in a five-minute' interval
drawn, e the number of flaws in a metre length of material, '
5. The random variable X is B(10, p) where p < 0.5. (c) the probability that no more than four
yellow counters arc drawn. e the number of white corpuscles on a slide.
The variance of X is 1.875. Find
(a) the value of p, 9. The random variable X is distributed binomially Assuming :hat each ~ccu.rs randomly, they are all examples of variables that can be modelled
(b) E(X), with mean 2 and variance '1 .6. Find usmg a Potsson distnbutiOn.
(c) P(X~2). (a) the probability that X is less than 6,
(b) the most likely value of X.
6. A die is biased and the probability, p, of
throwing a six is known to be less than t. An
experiment consists of recording the number of
sixes in 25 throws of the die.
T
I

Conditions for a Poisson model Unit interval


• Events occur singly and at random in a given interval of time or space, Care must be taken to specify the interval being considered.
• A, the mean number of occurrences in the given interval, is known and is finite. In Example 5.18 the mean number of amoebas in 10 ml of pond water from a particular pond
The variable X is the number of occurrences in the given interval. is four so the number in 10 ml is distributed Po( 4).
Now suppose you want to find a probability relating to the number of amoebas in 5 ml of
If the above conditions are satisfied, X is said to follow a Poisson distribution, written
water from the same pond. The mean number of amoebas in 5 ml is two, so the number in
X-, \vhcrc 5 ml is distributed Po(2).
;v Similarly, the number of amoebas in 1 ml of pond water is distributed Po(0.4).
x) for x
Example 5.19
Example 5.18
On average the school photocopier breaks down eight times during the school week (Monday
A student finds that the average number of amoebas in 10 ml of pond water from a particular
to Friday). Assuming that the number of breakdowns can be modelled by a Poisson
pond is four. Assuming that the number of amoebas follows a Poisson distribution,
distribution, find the probability that it breaks down
find the probability that in a 10 ml sample (a) five times in a given week,
(a) there are exactly five amoebas, (b) once on Monday,
(b) there are no amoebas, (c) eight times in a fortnight.
(c) there are fewer than three amoebas.
Solution 5.19
Solution 5.18
X is the number of amoebas in 10 ml of pond water, where X~ Po(4). (a) X is the number of breakdowns in a week, where X~ Po(8).

A_ X 8'
Using P(X ~ x) ~ e-'- with ,1. ~ 4, P(X ~ 5) ~ e- 8 -
x! 5!
~ 0.0916 (3 s.f.)
45
(a) P(X~5)~e- 4 - (b) Let Y be the number of breakdowns in a day.
5!
The mean number of breakdowns in a day is!~ 1.6, soY~ Po(1.6).
~ 0.156 (3 s.f.)
P(Y ~ 1) ~ 1.6e-1. 6
40
(b) P(X~O)~e- - 4 ~ 0.323 (3 s.f.)
0!
~ 0.183 (3 s.f.) (c) Let F be the number of breakdowns in a fortnight.
The mean number of breakdowns in a fortnight is 2 x 8 ~ 16, so F ~ Po(16).
(c) P(X <3) ~ P(X ~ 0) + P(X ~ 1) + P(X ~ 2)
40 41 42 16 8
=e-4-+e-4-+e-4- P(F~ 8) ~ e- 16 -
0! 1! 2! 8!
~ e- 4 (1 + 4 + 8) ~ 0.0120 (3 s.f.)
~ 13e-4
~ 0.238 (3 s.f.)
NOTE:
Mean and variance of the Poisson distribution
40
o P(X~O)~e- 4 - but4°~1 andO!~l,soP(X~O)~e- 4

0! The mean number of occurrences in the interval, A, is all that is needed to define the
4' distribution completely; A. is the only parameter of the distribution.
"P(X~l)~e- 4 - but4 1 ~4andl!~l,soP(X~1)~4e-
4

1! In a Poisson distribution, it is obvious that the mean, E(X) ~A., but it is also the case that
Var(X) ~A.. The following should be learnt:
These two results are useful in general H
If X·-
'! \
then 0\ and -'-·'
T
294 /-\ !j)i'-!( !

(c) P(X:>3)~1-P(X<:2)
Example 5.20 ~ 1-0.7834
X follows a Poisson distribution with standard deviation 1.5. Find P(X :> 3). ~ 0.217 (3 d.p.)
(d) X takes the values 0, 1, 2, ... ,to infinity, but from the tables, P(X <; 8) ~ 1.0000 to four
Solution 5.20 decimal places. This implies that for values of X greater than 8, the probabilities are very
small, so to three decimal places, P(X ~ 10) ~ 0.000.
If X- Po(A) then Var(X) ~A.
2
But Var(X) ~(standard deviation) 2 ~ 1.5 ~ 2.25,
. 1.610
In fact, usmg the formula, P(X ~ 10) ~ e-1. 6 x ~-
10! ~ 0 . 000 006 117 ...
so A~ 2.25 and X- Po(2.25).
If P(X > n) < 0.01
P(X :> 3) ~ 1 - P(X <3) 1- P(X <; n) < 0.01
~ 1- (P(X ~ 0) + P(X ~ 1) + P(X ~ 2)
P(X <; n) > 0.99
~ 1- e-2.25(1 + 2 .25 + 2.252)
2!
From tables P(X <; 4) ~ 0.9763 < 0.99
P(X <; 5) ~ 0.9940 > 0.99
~ 1-0.6093 ... The smallest integer n is 5.
~ 0.391 (3 s.f.)

Using cumulative Poisson probability tables Diagrammatic representation of the Poisson distribution
Notice that for small values of A, the distribution is very skew but it becomes more
If you have access to these tables you may wish to use them to calculate probabilities. The
symmetrical as A increases. '
tables are printed on page 647. As with the cumulative binomial tables, they give P(X <: r) for
p X- Po(!)
various values A, where X- Po(A).
Here is an extract for Po(1.6). p X- Po(l.6)

t.C p
X- Po(2J

0.2019 <--- 0.3 0.3 0.3


0.5249 <---
0.7834 <---
0.9212 <---
<--- f'i ;.: 0.2 0.2
0.9763 0.2
0.9940 <---
0.9987 <---
0.9997 <---
<--- 0.1 0.1 0.1 'i
8 1.0000

Example 5.21
Given that X- Po(l.6), use cumulative Poisson probability tables to find, to tbree decimal 0123456 X
Q 1 2 3 4 5 6 7 X
places,
p X- Po(2.2)
(a) P(X <; 6),
0.3
(b) P(X ~ 5),
p X- Po(3) p X- Po(3.8)
(c) P(X :> 3),
(d) P(X ~ 10).
Find also the smallest integer n such that P(X > n) < 0.01. 0.2 0.2 0.2
!
I
Solution 5.21
Using the table printed above, 0.1 ! 0.1 0.1
ji

(a) P(X <; 6) ~ 0.9987 ~ 0.999 (3 d.p.)


(b) P(X ~ 5) ~ P(X d ) - P(X <; 4)
I
I II 'i
i!
!i
ij

'!i'·
~ 0.9940- 0.9763 0 1 2 3 4 5 6 7 8 9 X
012345678X
~ 0.018 (3 d.p.)
Solution 5.22
p X- Po(5)
f-(x 156
(a) x~ f-( ~ ~ 1.04
150
0.2
X- Po(lO)
p (b) Let X be the number of e-mails received in a day. For a Poisson distribution with the same
mean, use X- Po(1.04) and calculate the probabilities of 0, 1, 2, 3, 4, ... e-mails. Multiply
0.1 0.1 these by 150 to obtain the theoretical frequencies.

X Frequency (nearest integer)

0 ~ 0.3534 ... 53
o 2 4 5 8 10 12 14 15 18 w•
1 1.04e·t.04 ~0.3675 ... 55

~
The mode of the Poisson distribution 2 0.1911 ... 29
-104 1.043
The mode is the value of X that is most likely to occur, i.e. the one with the greatest 3 e · x-- ~ 0.0662 ... 10
3!
probability. -104 1.044
4 e · x-- ~ 0.0172 ... 3
From the diagrams, you can see that 4!
>4 1- P(X,; 4) ~ 0.000 431 ... 0
when A== 1, there are two modes, 0 and 1,
when A ~ 2, there are two modes, 1 and 2, Total150
when A~ 3, there are two modes, 2 and 3. These compare reasonably well with the original distribution.
_[n if ,;l i~ an there are t'vVO /t 1 and L
A statistical test to compare the two sets of data, the x2 test, is illustrated on page 573.
For example, if X- Po(8), the modes are 7 and 8.
Notice also that
when A~ 1.6, the mode is 1,
when A ~ 2.2, the mode is 2,
when A~ 3.8, the mode is 3.
_j_n if), is not an , the nwdc ]., the in''''"''' -bc-lo"\v
se ·-sn'·l ;·.1
:::,'- U l '-·

1. An insurance company receives on average two 3. The number of bacterial colonies on a petri dish
For example, if X- Po(4.9), the mode is 4. claims per week from a particular factory. can be modelled by a Poisson distribution with
Assuming that the number of claims can be average number 2.5 per cm 2 •
modelled by a Poisson distribution, find the Find the probability that
Fitting a theoretical distribution to practical data probability that it receives
(a) in 1 cm 2 there are no bacterial colonies,
(a) three claims in a given week, (b) in 2 cm 2 there are more than four bacterial
As with the binomial distribution it is possible to fit a theoretical Poisson distribution to (b) more than four claims in a given week, colonies,
(c) four claims in a given fortnight, (c) in 4 cm 2 there arc six bacterial colonies.
experimental data. (d) no claims on a given day, assuming that the
factory operates on a five-day week. 4. On a particular motorway bridge, breakdowns
occur at a rate of 3.2 a week. Assuming that the
Example 5.22 2. A sales manger receives six telephone calls on number of breakdowns can be modelled by a
I recorded the number of e-mails I received over a period of 150 days with the following average between 9.30 a.m. and J 0.30 a.m. on a Poisson distribution, find the probability that
weekday. Find the probability that
results: (a) fewer than the mean number of breakdowns
(a) she will receive two or more calls between occur in a particular week,
0 1 2 3 4 9.30 a.m. and 10.30 a.m. on Tuesday, (b) more than five breakdowns occur in a given
Number of e-mails
(b) she will receive exactly two calls between fortnight,
51 54 36 6 3 9.30 a.m. and 9.40 a.m. on Wednesday, (c) exactly three breakdowns occur in each of
Number of days
(c) during a five-day working week, there will four successive weeks.
(a) Find the mean number of e-mails per day. be exactly three days on which she receives
(b) Calculate the frequencies of the Poisson distribution having the same mean. no calls between 10.00 a.m. and 10.10 a.m.
TI
!
11. F or eac
h of the following sets of data, fit a USING THE POISSON DISTRIBUTION AS AN APPROXIMATION TO THE
Cars arrive at a petrol station at an averag~ rate 'b · 'th the same
theoretical Poisson distn utton wt BINOMIAL DISTRIBUTION
5. of 30 per hour. Assumin~ ~hat the cars arnve at
mean
random, find the probab1hty that
(a) 1 2 3 4 5
(a) no cars arrive during a particular X 0
five-minute interval, . . 12 7 1
110 50 20 X
(b) more than three cars arnve dunng a f
five-minute interval, .
(c) more than five cars arrive in a 15-mmute t;an be ''ll'fHU.cdHEHcu a Poi<;.son dist.rilmt.i.on V•/ith the sdmc m.c:-l.fl; Le., X"-
(b) 1 2 3 4
X 0 T'hc get:, better n gets and jJ get;.; !)maUer.
interval, .
(d) in a half hour period, ten cars arnve, . 44 20 8 3
f 45
{e) fewer than three cars arrive in a ten-mmute
interval. Example 5.23
12. A finn investigated the number of employ~es
6. Flaws occur randomly in a rolll of ~bric at an suffering injuries whilst at work. The resu tl Eggs are packed into boxes of 500. On average 0. 7% of the eggs are found to be broken when
average rate of 1.5 per metre engt . recorded below were obtained for a 52-wee c
the eggs are unpacked. Find, correct to two significant figures, the probability that in a box of
(a) Find the probability that in a randomly h period: 500 eggs,
chosen one-metre length there are more t an
Number of employees
two flaws. d l Number of weeks (a) exactly three are broken,
(b) Find the probability that in a ran om y injured in a week
(b) at least two are broken.
chosen two-metre length there are no flaws. 31
(c) What is the standard deviation of the 0
1 17
number of flaws in a four-metre length? Solution 5.23
2 3
The number of calls made to a.He~lth C_entre can 1 Let X be the number of broken eggs in a box of 500.
' be modelled by a Poisson distnbutlO~ wtth
7 3
0 P(egg is broken)~ 0.007, so X- B(500, 0.007).
standard deviation 2 per five-minute_ mter~al. 4 or more
Find the probability that in a given hve-mthute
Give reasons why one might expect _thts E(X) ~ np ~ 500 x 0.007 ~ 3.5
interval the number of calls is more than t e
distribution to approximate to a Potssor;t f Since n >50 and p < 0.1, use a Poisson approximation, X - Po(3.5).
average' for a five-minute interval.
distribution. Evaluate the m~an_and vanance o
The average number of misprints~_on each page in the data and explain why thts gtv_es f_urt~er 3.5 3
evidence in favour of a Poisson dtstrtbutt:m. (a) P(X ~ 3) ~ e-3·5 -
8. the first draft of a novel is four. hnd the bl
Usin the calculated value of the me~n, ~md_ the 3!
probability that on a randomly selected dou e
theo~etical frequences of ~ Pois~on dtstnbutt~or ~ 0.22 (2 s.f.)
page for the number of weel~s _m whtch 0, 1, 2, 3, (C)
(a) there are three :nisprin~s on each page more, employees were lllJUred. (b) P(X:> 2) ~ 1- (P(X ~ 0) + P(X ~ 1))
(b) there are six mtspnnts m total. ~ 1- (e-3·5 + 3.se-3 ·5 )
13. Along a stretch of motorway, breakdowns ~ 0.86 (2 s.f.)
The number of goals scored in a ma~ch by . require the summoning of the breakdown
9. Random Rovers can be modelled usmg a ~msson services occur with a frequency of 2.4 per day,
distribution. The probability, to thle~ d~cl~tl on average. Assuming that the breal~downs occur
places that the team scores no goa s ts · d : randomly and that they follow a Pmsson
Given,that the mean number of goa~s_score m a Example 5.24
distribution, find
match is an integer, find the proba~thty that~he
(a) the probability that there will be exactly two A Christmas draw aims to sell5000 tickets, 50 of which will win a prize.
team scores fewer than three goals m a mate .
breakdowns on a given day,
(b) the smallest integer n such that the
10. The number of accidents occurring_ in _a w~ek in a
. (a) A syndicate buys 200 tickets. Let X represent the number of these tickets that win a prize.
. f ac t o ry follows a Poisson dtstnbutwn probability of more than n breakdowns m a (i) Justify the use of the Poisson approximation for the distribution of X.
certam
with variance 3.2. Find day is less 0.03. (ii) Calculate P(X <;; 3).
(a) the most likely number of accidents in a
(b) Calculate how many tickets should be bought in order for there to be a 90% probability
given week, 'd
(b) the probability that exa~tly seven acct ents of winning at least one prize. (C)
happen in a given fortmght.
Solution 5.24
P(a ticket wins a prize)~ s58o ~ 0.01
(a) Let X be the number of these tickets that win a prize.
Strictly speaking you do not have independent trials, but since n is very large, X can be
considered to be modelled by a binomial distribution where X- B(200, 0.01)
T
I 5. An aircraft has 116 seats. The airline has found, 8. A manufacturer has found that 3% of seeds
E(X) = np = 200 x 0.01 = 2. from long experience, that on average 2.5% of produced do not germinate. Using a Poisson
people who have bought tickets for a flight do approximation, find, to two significant figures,
(i) Since n >50 and p < 0.1, use a Poisson approximation, X- Po(2). not arrive for that flight. The airline sells 120 the probability that in a pack containing 150
tickets for a particular flight. seeds,
(ii) P(X<; 3) = P(X = 0) +P(X = 1) + P(X =2) + P(X = 3)
22 23 {a) Calculate, using a suitable approximation, (a) more than four fail to germinate,
= e-2 + 2e-2 +- e-2 +- e-2 the probability that more than ll6 people (b) at least 145 germinate
2! 3! arrive for the flight.
{b) Calculate also the probability that there are 9. X is B(250, p). The value of pis such that it is
= 0.86 (2 s.f.)
empty seats on the flight. (C) valid to apply a Poisson approximation. When
this is done, it is found that P(X = 0) = 0.0235.
(b) Let X be the number of these tickets that win a prize inn tickets, so X- B(n, 0.01) and 6. In a large town one person in 80, on average, has Find the value of p.
E(X) = np = 0.01n blood of type X. If 200 blood donors are
Assuming n >50 and p < 0.1, use X- Po(0.01n). sampled at random, find an approximation to 10. The probability that I dial a wrong number when
the probability that they include at least five making a telephone call is 0.015. In a typical
You want P(X> 1) = 0.9 people with blood type X. week I will make 50 telephone calls. Using a
But P(X > 1) = 1- P(X = 0) How many donors must be sampled in order that Poisson approximation to a binomial model find,
= l _ e-0.01n the probability of including at least one donor of correct to two decimal places, the probability
type X is 90% or more? (AEB) that in such a week,
So 0.9 = 1- e-o.otn
e-0.01n = 0.1 (a) I dial no wrong numbers,
7. A lottery has a very large number of tickets, one
(b) I dial more than two wrong numbers.
in every 500 of which entitles the purchaser to
Taking logs to base e prize. An agent sells 1000 tickets for the lottery. Comment on the suitability of the binomial
Using a Poisson approximation, find, to three model and of the Poisson approximation. (C)
-0.01n =In 0.1 decimal places, the probability that the number
ln 0.1 of prize-winning tickets sold by the agent is 11. A newspaper reports that 8.6% of adults in the
n=- 0.01 U.K. painted the outside of their houses.
(a) less than three A sample of 55 adults in the U.K. was selected.
n = 230.25 ... (b) more than five. Stating any necessary assumptions, show that the
Calculate the minimum number of tickets the number in the sample that painted the outsides
So the least integer value of n must be 231. agent must sell to have a 95% chance of selling of their own houses can be approximated by a
23 at least one prize-winning ticket. (NEAB) Poisson distribution.
Check: If n = 230 np = 230 x 0.01 = 2.3 and 1- e- . = 0.8997 ··· < 0.9 Using this approximation, find the probability
If n = 231: np = 231 x 0.01 = 2.31 and 1- e-231 = 0.9007 ... > 0.9 that fewer than four people in the sample painted
the outsides of their own houses. (C)
231 tickets should be bought.
Note that n can be found by trial and improvement methods if logarithms are not used.

approximation binomial THE SUM OF INDEPENDENT POISSON VARIABLES


3. On average one in 200 cars breaks down on a
1. The random variable X is B(100, 0.03). l'or in:.lqJcn ){ and Y., if}( andY
certain stretch of road per day. Find the
Find the following probabilities using
probability that, on a randomly chosen day,
(i) the binomial distribution then Y v Po(m "!l}
(ii) a Poisson approximation (a) none of a sample of 250 cars break down,
(b) more than two of a sample of 300 cars
(a) P(X = 0), (b) P(X = 2), (c) P(X = 4).
break down. Example 5.25
2. The probability that a bolt is defective is 0.2%.
Bolts are packed in boxes of 500. 4. Two dice are thrown Two identical racing cars are being tested on a circuit. For each car, the number of mechanical
(a) What is the probability of throwing a breakdowns can be modelled by a Poisson distribution with a mean of one breakdown in 100
(a) Find the probability that in a randomly
double six? laps. If a car breaks down it is attended and continues on the circuit. The first car is tested for
chosen box,
(i) there are two defective bolts, . Two dice are thrown a total of 90 times. 20 laps and the second car for 40 laps.
(ii) there are more than three defecttve (b) What is the probability that at least two
bolts. double sixes are thrown? Find the probability that the service team is called out to attend to breakdowns
(b) Two boxes are picked at random from the
production line. Find the probability that (a) once,
one has two defective bolts and the other (b) more than twice.
has no defective bolts.
(c) Three boxes are selected at random. Find
the probability that they contain no
defective bolts.
Solution 5.25
T se
,.,
::,u ms sson variabies
Since the average number of breakdowns in 100 laps is one, the average number in 20 laps is 1. Telephone calls ~each a secretary independently 3. A large number of screwdrivers from a trial
and at random, mternal ones at a mean rate of production run is inspected. It is found that the
0.2 and the average number in 40 laps is 0.4. two in any five-minute period, and external ones cellulose acetate handles are defective on 1% and
Let X be the number of breakdowns of the first car, then X- Po(0.2) at a mean rate of one in any five-minute period. that the chrome steel blades are defective on
Let Y be the number of breakdowns of the second car, tben Y- Po(0.4) Calculate the probability that there will be more ~i% of the screwdrivers, the defects occurring
Let T be the total number of breakdowns, than two calls in any period of two minutes. mdependently.
(O&C)
Then T ~X+ Y and T- Po(0.2 + 0.4), i.e. T- Po(0.6) (a) What. is the probability that a sample of 80
contams more than two defective
2. D~ring a weekday, heavy lorries pass a census
(a) P(T~ 1)~0.6e- · 0 6
screwdrivers?
pomt P on a village high street independently
~ 0.329 (3 d.p.) and at random times. The mean rate for
(b) What. is the probability that a sample of 80
con tams at least one screwdriver with both a
west~ard trave.lling lorries is two in any
(b) P(T> 2) ~ 1- (P(T~ 0) + P(T~ 1) + P(T~ 2)) defective handle and defective blade?
30-~1m.utes per!oJ, and for eastward travelling

~ 1- e-
0
~~ )
2
(O&C)
06 lornes IS three many 30-minute period.
(1 + 0.6 +
Find the probability 4. A restaurant kitchen has two food mixers A and
~ 0.023 (3 d.p) (a) t~at
there will be no lorries passing Pin a B. The number of times per week that A breaks
giVen ten-minute period, down has a Poisson distribution with mean 0.4
(b) that at least one lorry from each direction while independently the number oftimes that B
will pass P i~ a given ten-minute period, breaks down in a week has a Poisson distribution
(c) th?t the~e will be ~xactly four lorries passing with m~~n 0.1. Find, to three decimal places, the
Example 5.26 probability that in the next three weeks
P m a given 20-mmute period. (0 & C)
The centre pages of the Weekly Sentinel consist of a page of film and theatre reviews and a (a) A will not break down at all
page of classified advertisement. The number of misprints in the reviews can be modelled (b) each mixer will break down 'exactly once
using a Poisson distribution with mean 2.3 and the number of misprints in the classified (c) there will be a total of two breakdowns. '(L)
section can be modelled by a Poisson distribution with mean 1. 7.
Using cumulative Poisson probability tables, find
(a) the probability that on the centre pages there will be more than five misprints,
(b) the smallest integer n such that the probability that there are more than n misprints on the
centre pages is less than 5%.

Solution 5.26
Let X be the number of misprints in the reviews, then X- Po(2.3)
Let Y be the number of misprints in the classified advertisements, then Y- Po(l.7)
Let T be the total number of misprints on the centre pages, then T ~ X+ Y and
T- Po(2.3 + 1. 7), i.e. T- Po(4 ). / 4.!1 FiX, , )
The cumulative tables are printed on page 647 and the relevant ------+---
extract is shown here: 0 0.0183
0.0916
0.2381
(a) P(T > 5) ~ 1 - P(T 0) 0.4335
~ 1- 0.7851 0.6288
~ 0.215 (3 d.p.) 0.7851
0.8893
(b) You need the smallest value of n such that 0.9489
R 0.9786
P(T > n) < 0.05 9 0.9919
I.e. 1- P(T< n) < 0.05 lU 0.9972
so P(T< n) > 0.95 JJ 0.9991
'11 0.9997
From the tables, P(T< 7) ~ 0.9489 < 0.95
Jj 0.9999
P(T < 8) ~ 0.9786 > 0.95 1.0000
14
15
The smallest value of n is 8.
T
'

I
Miscellaneous worked examples

Summary Example 5.27


Every working day Mr Driver pulls out from his drive on to a main road in such a way that
w Uniform distribution
there is a very small probability p that his car will be involved in a collision.
(a) Show that in a five-day week the probability that there will be no collision is (1- p) 5 •
(b) State one assumption that is made in this calculation.
(c) Using a binomial expansion, show that the probability that there will be at least one
" Geometric distribution X- Geo(p) collision in a five-day week is approximately Sp.
(d) Given that p ~ 0.001, use a calculator to find the probability that Mr Driver will avoid a
pis the probability of a successful outcome
collision in 500 working days. (NEAB)
X is the number of independent trials needed to obtain the first successful outcome.

P(X ~ x) ~ qx-1 x p for x ~ 1, 2, 3, ... ,to infinity, where q ~ 1- P Solution 5.27


(a) X is the number of collisions in five days, X- B(S, p).
Note that X cannot be zero.
P(X ~ 0) ~ q 5
1 ~(1-p) 5 whereq~1-p.
E(X)~-, Var(X)~-,:.
q mo de-
-1 ·
p p
(b) The circumstances remain the same for the five days; the events are independent.
" Binomial distribution X- B(n,p) (c) P(X;>1) ~1-P(X~O)
~ 1- (1-p) 5
p is the probability of a successful outcome.
~ 1- (1- Sp + 10p 2 + ···)
X is the number of successful outcomes in n independent trials ~ 1 - 1 + Sp (ignoring higher powers of p since pis small)
P(X;>1) dp
P(X~x) ~ ncxqn-x xp" for x~O, 1,2, ... , n, where q~ 1- p
(d) X is the number of collisions in 500 days, X- B(SOO, 0.001)
E(X) ~ np and Var(X) = npq. Using the binomial distribution
P(X ~ 0) ~ 0.999 500 ~ 0.606 (3 d.p.)
" Poisson distribution X- Po(A)
Alternatively, since n >50 and p < 0.1, using a Poisson approximation with
X is the number of occurrences of an event in a given interval of time or space, when the mean~ np ~ 0.5, X- Po(O.S), so
mean number of occurrences in the given interval is A.
P(X ~ 0) ~ e-o.s ~ 0.606 (3 d.p.)
Ax . . .
P(X ~ x) ~ e-'- for x ~ 0, 1, 2, 3, ... ,to mhmty
x!
Example 5.28
E(X) ~ A and Var(X) ~A.
A salesman sells goods by telephone. The probability that any particular call achieves a sale
@ Poisson approximation to the binomial distribution is f,, independently of all other calls. The salesman continues to make calls until one call
achieves a sale.
If X_ B(n, p) with n >SO and p < 0.1, then X- Po(np) approximately.
(a) Name an appropriate distribution with which to model this situation.
" Sum of independent Poisson variables (b) Calculate the probability that the call that achieves a sale
If X- Po(m) and X- Po(n) then X+ Y- Po(m + n). (i) is the fifth call made,
(ii) does not occur in the first five calls.
(c) Obtain the mean and variance of the number of calls the salesman makes. (C)
Solution 5.28
T Lengths of this cable are wound on to drums. Each drum carries 50 metres of cable.
Find the probability that a drum will have three or more weak spots.
(a) X is the number of calls until a call achieves a sale. A contractor buys five such drums. Find the probability that two have just one weak spot each
X can be modelled by a geometric distribution, X - Geo({2). and the other three have none. (AEB)

(b) (i) P(X ~5) ~ q 4 P


=IT
( 11)4
xu 1 Solution 5.30
~ 0.059 (2 s.f.)
X is the number of weak spots in 100m of cable, X- Po(1).
(ii) P(X > 5) ~ q Let Y be the number of weak spots in 50 m of cable, Y- Po(0.5).
5

~<HY P(Y;;. 3) ~ 1- P(Y <3)


~ 0.65 (2 s.f.) ~ 1- (P(Y ~ 0) + P(Y ~ 1) + P(Y ~ 2))
1 1
(c) E(X)~-~ 1 =12 ~ 1- ( e-0.5 + 0.5 e-0.5 + 0;~2 e-0.5)
p IT
q 11
~1-e-05 (1+0.5+ ;~ )
0 2
Var(X) ~-~_12_~ 132
p2 (,',:)2
So E(X) ~ 12 and Var(X) ~ 132. ~ 0.01438 ...
~ 0.014 (2 s.f.)
P(a drum has one weak spot)~ P(Y ~ 1) ~ 0.5e-0·5
Example 5.29 P(a drum has no weak spot)~ P(Y ~ 0) ~ e-0 ·5
The number of births announced in the personal column of a local weekly newspaper may be
In five drums,
modelled by a Poisson distribution with mean 2.4.
P(2 have one weak spot, 3 have none)
Find the probability that, in a particular week,
(a) three or fewer births will be announced,
~ 5
c, X (P(Y ~ 1)) 2 X (P(Y ~ 0)) 3
(AEB) ~ 10 x (0.5 e-0 ·5 ) 2 x (e-0 ·5 ) 3
(b) exactly four births will be announced. ~ 10 x 0.25 e- 1 x e-1.s
= 2.5 x e-2 ·5
Solution 5.29 ~ 0.21 (2 s.f.)

X is the number of birth announcements in a week, X- Po(2.4).


(a) P(Xd) ~P(X ~ 0) + P(X ~ 1) +P(X ~2) +P(X ~ 3)
2.42 4 2.4' -24
- -2.4 + 2 4e-2.4 + - - e-2. + - - e .
- e · 2! 3!
2 3
2.4 2.4 )
-e- 2.4 1+24+--+-- Miscellaneous exercise 5g
- ( . 2! 3!
~ 0.779 (3 s.f.) 1. The random variable X has the binomial 3. Copies of an advertisement for a course in
distribution B(lO, 0.35). Find P(X ~ 4). practical statistics arc sent to Mathematics
2.44 The random variable Y has the Poisson teachers in a large city. For each teacher who
(b) P(X ~ 4) ~ e- - -
24 receives a copy, the probability of subsequently
distribution with mean 3.5. Find P(2 < Y < 5). (L)
4! attending the course is 0.09.
~ 0.125 (3 s.f.) 2. The number of white corpuscles on a slide has a Twenty teachers receive a copy of the advertise-
Poisson distribution with mean 3.2. ment. What is the probability that the number
who subsequently attend the course will be
(a) Find the most likely number of white
corpuscles on a slide. (a) two or fewer,
Example 5.30 (b) Calculate correct to three decimal places the (b) exactly four. (AEB)
probability of obtaining this number.
Weak spots occur at random in the manufacture of a certain cable at an average rate of one (c) If two such slides arc prepared, what is the
per 100 metres. · d h d' 'b · probability, correct to three decimal places,
If X represents the number of weak spots in 100 metres of cable, wnte own t e 1stn ut1on of obtaining at least two white corpuscles in
total on the two slides?
of X.
(a) Find the probability that exactly two cars 10. The number of night calls to a fire station in a The fisherman fishes for six days every week for
4. (a) Experience shows that a charit~ receives are hired out on any one day· .
small town can be modelled by a Poisson many weeks. Estimate the mean and the standard
replies to letters at the rate of etght per 100. distribution with mean 4.2 per night. Find the deviation of the number of successful days per
(b) Find the probability that all cars are muse
Calculate, giving each answer to two probability that on a particular night there will week over this period (C)
decimal places, the probability tha.t the on any one day. .
(c) Find the probability that all cars are muse be three or more calls to the fire station.
number of replies from ten letters ts State what needs to be assumed about the calls to 15. A large number of groups, each consisting of
on exactly three days of a five-day week.
(i) 0, (ii) 1, (iii) 2, (iv) more than 2. (d) Find the probability that exactly ten .cars are the fire station in order to justify a Poisson 12 adults, are selected at random from the
demanded in a five-day week. Explam model. (C) population of a particular town. Given that 30%
(b) On average, out of every 1000 it:ms ~ade of the adults in this town are car owners,
whether or not such a demand could always
by a certain factory worker, one tt~m ts 11. A television repair company uses a particular calculate
defective. The items are inspected m batches. be met. spare part at a rate of four per week.
A large number of batches, each of n items, (e) It costs the firm £20 a day to run eac~ car., (a) the probability that a group contains not
Assuming that requests for this spare part occur
made by the worker is inspected. Evaluate n whether it is hired out or not. The datly htre more than two car owners,
at random, find the probability that
charge per car is £50. Find the expected (b) the mean and the standard deviation of the
in each of the following cases. (a) exactly six are used in a particular week,
daily profit. (MEI) number of car owners in the groups. (C)
(i) The mean number of defective items per (b) at least ten are used in a two-week period,
batch is 0.045. 8. A factory produces a particular type of electronic (c) exactly six are used in each of three 16. In a large city one person in five is left-handed.
(iii The standard deviation of the number component. The probability of a component consecutive weeks.
of defective items per batch 0.333. (a) Find the probability that in a random
being acceptable is 0.95. The components are The manager decides to replenish the stock of sample of ten people
The probability of at least one defective item packed in boxes of 24. this spare part to a constant level n at the start of (i) exactly three will be left-handed,
in a batch of N items is greater than 0.02. (a) Calculate the probability that a box, chosen each week. (ii) more than half will be left-handed.
Use this information to write down an at random, contains exactly 22 acceptable (d) Find the value of n such that, on average, (b) Find the most likely number of left-handed
inequality which is satisfied by N. (C) components. the..stQ.ck will be insufficient no more than people in a random sample of 12 people.
All boxes are inspected and a box is rejeCted if it once in a 52-week year. (L) (c) Find the mean and the standard deviation of
5. A store sells word processors. The proportion the number of left-handed people in a
which are returned as faulty has been found to contains fewer than 22 acceptable components.
12. In the Growmore Market Garden plants are random sample of 25 people.
be 0.035. During the Christmas period of 1995, (b) Calculate the probability that a box, chosen inspected for the presence of the deadly red (d) How large must a random sample be if the
the store sold 104 word processors. The ~umber at random, is rejected. angus leaf bug. The number of bugs per leaf is probability that it contains at least one left-
of these which will be returned as faulty IS X. known to follow a Poisson distribution with handed person is to be greater than 0.95?
The factory produces 80 boxes per day over a
Assuming independence, state the exact mean one. What is the probability that any one
long period of time.
distribution of X. leaf on a given plant will have been attacked (at 17. Batches of 400 shells in the First World War
Give reasons why this distribution c~n be (c) Estimate the mean and the standard least one bug is found on it)? were classified as 'accepted' or 'rejected' by
approximated by a Poisson distributiOn. deviation of the number of boxes rejected
A random sample of 12 plants is taken. For each testing a small number of shells from the batch.
Calculate the probability that at most three of per day. Tested shells are either 'good' or 'bad'; the
plant ten leaves are selected at random and
the word processors will be returned as faulty. It is proposed to introduce ~n alter?ative policy probability that a randomly selected shell is good
(C) inspected for these bugs. If more than eight
with regard to packing and mspectwn, as leaves on any particular plant have been attacked is p.
follows: then the plant is destroyed. What is the (a) In one testing method, eight shells from a
6. (a) The probability that a seed of a particular . probability that exactly two of these 12 plants
variety of bean will germinate when sown ts The daily production of compon~n.ts is to be batch (of 400) are selected at random and
packed in 160 boxes, each contammg 12 are destroyed? (AEB) tested. The batch is accepted if at least three
0.96. .
Seeds are sold in packets of 50. If a packet ts components and boxes containing fewer than 11 of these eight shells are good. Use a
acceptable c~mponents are to be rejected. 13. In Blackbury it is known that 0.4% of people binomial distribution, with p"" 0.2, to find
selected at random, calculate the probability have blood group AB-.
that the number of seeds which will {d) Estimate the mean number of boxes rejected the probability that the batch is accepted.
Blackbury High School has 1000 pupils, with 28 (b) In a second testing method, each batch of
germinate when sown is exactly per day under this alterna~ive policy: pupils in class 4T. 400 is subdivided into four sub-batches of
(i) 50, (ii) 49, (iii) 48. (e) Explain whether or not th1s alte~nat1ve (a) (i) Write down a distribution that could be 100 shells each. Two shells from each sub-
policy would lead to a decrease m t~e used to model the number of pupils in
If 200 packets of seeds are selected, estimate expected number of components reJected per batch are tested, and the sub-batch is
the number of packets from each of which class 4T with blood group AB-. accepted if at least one of the two shells is
day. (C) (ii) Hence calculate the probability that
fewer than 48 seeds will germinate. good. Use a binomial distribution, with
If three packets of seeds are selected, there are exactly two pupils in class 4T p~0.2,
9. A large bin contains 5250 used golf ball.s, 1260 with blood group AB-.
calculate, to three decimal places, the of which are unusable. The random vanable R (i) to show that the probability that one
probability that at least 149 of the 150 seeds (b) Using an appropriate distributional particular sub-batch is accepted is 0.36,
denotes the number of unusable balls in a
will germinate. approximation, calculate the probability (ii) to find the probability that, out of four
random sample of ten balls, selected without
(b) A self-employed worker contacts an ~gency that there are fewer than six pupils at the sub-batches, at least three are accepted.
replacement, from the bin.
every morning in an attempt to obtam ~ark school with blood group AB-. (c) In a third testing method, four shells are
for the day. The probability that work ts (a) Explain why R may be approximated as a (c) State an assumption that you have made in selected and the batch (of 400) is accepted if
available on any given day is 0.9. Calculate, binomial random variable with parameters answering this question. (NEAB) all four of the shells are good. The
for a period of 100 working days, the mean 10 and 0.24. probability that the batch is accepted is
and the standard deviation of the number of (b) Hence calculate the probability that the 14. The probability that a fisherman has a successful 0.01. Assuming a binomial distribution,
days on which work is available. (C) sample contains day's fishing is 0.6. Given that he fishes for six find the value of p.
(i) exactly three unusable balls, days every week, find the probability that in any (d) State one condition which must be satisfied
7. A car hire firm has three cars, which it hires out (ii) at most three unusable balls. (NEAB) week he has by the shells if a binomial model is to be
on daily basis. The number of cars demanded (a) exactly four successful days, valid, and give a reason why it may not be
per day follows a Poisson distribution with (b) at least two successful days. satisfied in this context. (C)
mean2.1.
25. The number of <;>il ta~ker~ arriving at a port The ground floor of a new office block has 10
(a) Regarding a month as a twelfth part of a
18. Define the Poisson distribution and state its between successive htgh ttdes has a Poisson rooms. Each room has an area of 80m 2 and has
year, ~istribution with mean 2. The depth of the water been ~arpeted using the same commercial carpet
mean and variance. (i) show that the probability that, between
The number of telephone calls received at a them, three such doctors see no cases of IS such that loaded vessels can enter the dock descnb:d above. For any one of these rooms
switchboard in any time interval of length T a broken nose in a period of one month area only on the high tide. The port has dock determme the probability that the carpet in the
minutes has a Poisson distribution with is 0.779, correct to three significant space for only three tankers which are room
mean !T. The operator leaves the switchboard
figures,
disch~rged and leave the do~k area before the (c) contains at least two faults
unattended for five minutes. (ii) find the variance of the number of cases ne~t.ttde. Only the first th-ree loaded tankers (d) contains exactly three faul;s
Calculate to three decimal places the seen by three such doctors in a period wattmg at any high tide g6 into the dock area·
{e) contains at most five faults.'
probabilities that there are (a) no calls, (b) four any ?thers must await another high tide. '
of six months. Find the probability that in exactly half of these
or more calls in her absence. St~rtmg fr~m an evening high tide after which no
(b) Find the probability that, between them,
Find to three significant figures the maximum shtJ?S remam waiting their turn, find (to three ten rooms the carpets will contain exactly three
three such doctors see at least three cases in
length of time in seconds for which the operator decimal places) the probabilities that after the faults. (AEB)
one year
could be absent with a 95% probability of not (c) Find the probability that, of three such next morning's high tide
missing a call. (NEAB} 27. During each ~orking day in a certain factory a
doctors, one sees three cases and the other (a) the three dock berths remain empty, numbe~ of acctde~ts occur independently
two see no cases in one year. (C) (b) the three berths are all filled. accordmg to a Poisson distribution with
19. A shop sells a particular make of radio at a rate
of four per week on average. The number sold in Find (to two decimal places) the probability that mean 0.5.
a week has a Poisson distribution. 23. Lemons are packed in boxes, each box no tankers are left waiting outside the dock area Calculate the probability that
containing 200. It is found that, on average, after the following evening's high tide. (NEAB)
(a) Find the probability that the shop sells at 0.45% of the lemons are bad when the boxes arc (a) dur_ing any one day there are two or more
least two in a week. opened. Use the Poisson distribution to find the acctdents,
26. In the manufacture of commercial carpet small {b) during two consecutive days there are
{b) Find the smallest number that can be in probabilities of 0, 1, 2, and more than two bad faults occur at random in the carpet at a~ exactly three accidents altogether.
stock at the beginning of a week in order to lemons in a box. average rate of 0.95 per 20m 2 . Find the
have at least a 99% chance of being able to A buyer who is considering buying a probability that in a randomly selected 20 mz Out of 50 consecutive five-day weeks how many
meet all demands during that week. (L) consignment of several hundred boxes checks the area of this carpet would you expect to be accident-free?
quality of the consignment by having a box
20. The independent Poisson random variables X opened. If the box opened contains no bad (a) there are no faults,
andY have means 2 and 5, respectively. Obtain lemons he buys the consignment. If it contains (b) there are at most two faults.
the mean and variance of the random variables more than two bad lemons he refuses to buy, and
if it contains one or two bad lemons he has
(a) Y-X, another box opened and buys the consignment if
(b) 2Y+10.
the second box contains fewer than two bad
For each of these random variables give one lemons. What is the probability that he buys the
reason why the distribution is not Poisson. consignment?
(NEAB)
Another buyer checks consignments on a
different basis. He has one box opened; if that
21. Fanfold paper for computer printers is made by box contains more than one bad lemon he asks
putting perforations every 30 em in a continuous for another to be opened and does not buy if the
roll of paper. A box of fanfold paper contains second also contains more than one bad lemon.
2000 sheets. State the length of the continuous What is the probability that he refuses to buy the
roll from which the box of paper is produced. consignment?
The manufacturers claim that faults occur at
random and at an average rate of one per
24. A hire company has two electric lawnmowers
240 metres of paper. State an appropriate which it hires out by the day. The number of
distribution for the number of faults per box of demands per day for a lawnmower has the form
paper. Find the probability that a box of paper of a Poisson distribution with mean 1.50. In a
has no faults and also the probability that it has period of 100 working days, how many times do
more than four faults. you expect
Two copies of a report which runs to 100 sheets
per copy are printed on this sort of paper. Find (a) neither of the lawnmowers to be used,
the probability that there are no faults in either (b) some requests for the lawnmowers to have
copy of the report and also the probability that to be refused?
just one copy is faulty. (MEJ) If each lawnmower is to be used an equal
amount, on how many days in a period of 100
22. A randomly chosen doctor in general practice working days would you expect a particular
sees, on average, one case of a broken nose per lawnmower not to be in use? (MEl)
year and each case is independent of other
similar cases.
-~

Mixed test 58
Mixed test 5A
1. In practising the high jump a certain athlete has 4. The number of customers entering a certain
4. A geography student is studying the distribution
1. A series of n experiments is carried out and in five attempts at a particular height. The branch of a bank on a Monday lunchtime may
of telephone boxes in a large rural area wher;
each experiment the only possible outcomes are probability that she succeeds at any one attempt be modelled by a Poisson distribution with mean
there is an average of 300 boxes per 500 km . A
'success' and 'failure'. The total number of is p. Find an expression, in terms of p, for the 2.4 per minute.
map of part of the area is divided into 50
successes is denoted by X. State two conditions probability that she succeeds
squares, each of area 1 km 2, and the student (a) Find the probability that, during a particular
which must be satisfied for the distribution of X minute, four or more customers enter the
wishes to model the number of telephone boxes (a) exactly four times,
to be modelled by a binomial distribution. branch.
per square. (b) exactly two times.
Gromit invites 11 friends to a party. For each
friend, the probability that he or she will accept (a) Suggest a suitable simple model the student The probability that she succeeds exactly four The probability that a customer, who enters the
the invitation may be taken to be j. Use a could use and specify any parameters times is twice the probability that she succeeds branch, intends to open a new account is 0.002
binomial distribution to calculate the probability required. exactly two times. Find the value of p. {C) and is independent of the intentions of other
that customers. During a particular morning 450
One of the squares is picked at random.
2. Before starting to play the game 'Snakes and customers enter the bank.
{a) exactly nine, (b) Find the probability that this square does Ladders' each player throws an ordinary
(b) fewer than nine, (b) Use a suitable approximation to find the
not contain any telephone boxes. unbiased die until a six is obtained. The number
probability that three or fewer of these 450
of the friends will accept the invitation. (c) Find the probability that this square of throws before a player starts is the random customers intend to open new accounts.
Give a reason why a binomial distribution might contains at least three telephone boxes. variable Y, where Y takes the values 1, 2, 3, .... (AEB)
not be a good model in this situation. (C) The student suggests using this model on another {a) Name the probability distribution of Y,
map of a large city and surrounding villages. stating a necessary assumption. 5. A process for making plate glass produces small
2. The weekly number of detached dwellings sold
(d) Comment, giving your reason briefly, on the (b) Find Var(Y). bubbles {imperfections) scattered at random in
by an estate agent may be modelled by a Poisson
suitability of the model in this situation. (L) (c) Two people play Snakes and Ladders. the glass, at an average rate of four small bubbles
distribution with mean 2.75 and, independently,
Calculate the probability that they will each per 10m 2 •
the weekly number of other dwellings sold may
5. A crossword puzzle is published in The Times need at least five throws before starting. (C) Assuming a Poisson model for the number of
be modelled by a Poisson distribution with
each day of the week, except Sunday. A woman small bubbles, determine, to three decimal
mean 3.25. 3. State, giving your reasons, the distribution which places, the probability that a piece of glass
Determine the probability that the estate agent is able to complete, on average, eight out of ten
of the crossword puzzles. you would expect to be appropriate in describing 2.2 m x 3.0 m will contain
sells
(a) Find the expected value and the standard (a) the number of heads in ten throws of a (a) exactly two small bubbles,
(a) exactly four detached dwellings in a week,
deviation of the number of completed penny, (b) at least one small bubble,
(b) between ten and 15, inclusive, detached
crosswords in a given week. (b) the number of blemishes per square metre of (c) at most two small bubbles.
dwellings over a four-week period,
(b) Show that the probability that she will sheet metal. Show that the probability that five pieces of
(c) fewer than five dwellings in a week. (NEAB)
complete at least five in a given week is A building has an automatic telephone exchange. glass, each 2.5 m by 2.0 m, will all be free of
3. In one part of the country, one person in 80 has 0.655 (to three significant figures). The number X of wrong connections in any one small bubbles is e-io.
blood of Type P. A random sample of 150 blood {c) Given that she completes the puzzle on day is a Poisson variable with parameter A. Find, Find, to three decimal places, the probability that
donors is chosen from that part of the country. Monday, find, to three significant figures, in terms of A, the probability that in any one day five pieces of glass, each 2.5 m by 2.0 m, will
Let X represent the number of donors in the the probability that she will complete at there will be contain a total of at least ten small bubbles. (L)
sample having blood of Type P. least four in the rest of the week.
(d) Find, to three significant figures, the (c) exactly three wrong connections,
(a) State the distribution of X. Find the probability that, in a period of four weeks, (d) three or more wrong connections.
parameter of the Poisson distribution which she completes four or less in only one of the Evaluate, to three decimal places, these
can be used as an approximation. Give a four weeks. (C) probabilities when A= 0.5. Find, to three decimal
reason why a Poisson approximation is places, the largest value of A for the probability
appropriate. of one or more wrong connections in any day to
(b) Using the Poisson distribution, calculate the
be at most-!. (L)
probability that in the sample of 150 donors
at least two have blood of Type P.
(c) A hospital urgently requires blood of
Type P. How large a random sample of
donors must be taken in order that the
probability of finding at least one donor of
Type P should be 0.99 or more. (MEI)
T Example 6.1
X is the delay, in hours, of a flight from Chicago, where
f(x) = 0.2- 0. 02x, 0 <: x <: 10
Find
(a) the probability that the delay will be less than four hours
(b) the probability that the delay will be between two and si~ hours.

Solution 6.1
It is useful to draw a sketch of f(x).
Probability distributions II Note that since f(x) is valid for 0 <: x <: 10, the delay can be between 0 and 10 hours.
continuous variables f{x)

0.2

In this chapter you will learn


® about probability density functions for continuous random variables
@ how to find probabilities by calculating areas under curves 0 10

how to find (a) The probability that the delay will be less than four hours is given by the area under th
- the expectation, E()() of the continuous random variable, X curve between 0 and 4. e
- the expectation of any function of X Method 1 - using geometry
- the variance of X In this example
. it is easy to calculate the area using A = 1z (a+ b)h , th e f ormu1a f or the area
- the mode o f a t rapez1um.
about the cumulative distribution function, F(x) a= 0.2, h = 4 f(X)

how to find the median, quartiles and other percentiles, b = ((4) = 0.2-0.02 X 4 = 0.12 0.2

-~10;:;-----:;,
how to obtain the probability density function f(x) from the cumulative function F(x) A=i(a+b)h
= !(0.2 + 0.12) X 4
about the rectangular (uniform) distribution
= 0.64
Method 2 -using integration Ii;;=;:A:::;¥-4b
CONTINUOUS RANDOM VARIABLES
The following are examples of continuous random variables:
P(O <:X<: 4) =I: (0.2- 0.02x)dx
= [ 0.2x- 0.01x 2 ]~
e the mass, in grams, of a bag of sugar packaged by a particular machine
= 0.8-0.16
• the time taken, in minutes, to perform a task,
e the height, in centimetres, of a five-year-old girl, = 0.64
e the lifetime, in hours, of a 1 00-watt light bulb. The probability that the delay will be less than four hours is 0.64.
(b) The probability that the delay will be between two and six hours is given by the area
under the curve between 2 and 6.
PROBABILITY DENSITY FUNCTION (P.D.F.) f(x)
Method 1 - using geometry:
A continuous random variable X is given by its probability density function (p.d.f.), which is 0.2
((2) = 0.2-0.02 X 2 = 0.16
specified for the range of values for which x is valid. The function can be illustrated by a
((6) = 0.2- 0.02 X 6 = 0.08
curve, y = f(x). Note that this function cannot be negative throughout tbe specified range.
A= !(a+ b)h 0.16
Probabilities are given by the area under the curve. It is sometimes possible to find an area by A
= !(0.16 + 0.08) X 4
geometry, for example by using formulae for the area of a triangle or a trapezium. Often, = 0.48
0 2
however, areas need to be calculated using integration. P(2 <: X<: 6) = 0.48 4
You will need to find this by integrating:
Method 2 - using integration: 6 1

P(2 (X ( 6) = r: (0.2- 0.02x)dx


P(X>5)= -x(6-x)dx
5 36 I f~)

f(x) =-jgx{6-xl

= [ 0.2x- 0.01x ]~
2 = 1
36
J' (6x-
5
2
x )dx

~I
= 1.2-0.36- (0.4- 0.04)
= 0.48 = 316 [3xz-
The probability that the delay will be between two and six hours is 0.48.
= 0.074 (3 d.p.)
The probability that the mass is more than 5 kg is 0.074 (3 d.p.).
Notice that the total area under the curve gives the total probability.
In the above example it is easy to check by finding the area of the triangle.
ln for a continuous random variable \'1/ith y y = f(x)

Area of triangle= ! base x height f(x) valid ovTr the range a ,;; x < h I l }
i
= X 10 X 0.2 0.2 ~
I I
=1 I I
Area= 1
lO J10 I I
Alternatively, f(x)dx = (0.2- 0.02x)dx b
J0 0 '
= [0.2x- 0.01x']10
0 0 10
y

=1
Note that it is not possible to find the probability that the delay is, say, exactly three hours.
If you try to integrate, you get

P(X = 3) = r
3
f(x)dx = 0
a
Remember that in an e.xperimental approach, the area under the histogram represents
x1

frequency. In a theorellcal approach, the area under the curve y = f(x) represents probability.
x2 b x

You can only find the probability that X lies within a particular range.
It is also not possible to distinguish between
Example 6.3
P(2<X<6),
A continuous random variable has p.d.f. f(x) = kx 2 for 0 ( x ( 4.
P(2 <X< 6),
P(2 <X< 6), (a) Find the value of the constant k.
P(2 <X< 6), (b) Find P(l< X ( 3).
so there is no need to worry about whether the inequality is strict or not.
Solution 6.3
Example 6.2
X is the continuous variable, the mass, in kilograms, of a substance produced per minute in an
(a) J
all x
f(x)dx = 1

industrial process, where


J>x'dx= 1
(0 (X( 6)
otherwise
Find the probability that the mass is more than 5 kg.
[kn: =1
k
64-=1
3
Solution 6.2
Note that f(x) is a quadratic function and use this to help to draw the sketch of f(x), noting k=]_
64
that f(x) = 0 when x = 0 and x = 6. 3
Since you want the probability that xis more than 5, shade the area between 5 and 6. f(x)=-x 2 ,0<x<4
64
f(x) (b) The p.d.f. of f(x) is
3)~J' 2_x
y
2
(b) P(1 <:X<: dx 2
64 !(x+2) -2<:x<0

!
1 I
2
f(x)~! O<:x<:1!
~ :4 (~'J: 0 otherwise
Y=~(x+2) 2
y
~ 0.406 25
~ 0.41 (2 s.f.)
0

(c) P(-1 ,;;; X< 1) is given by the shaded area.


-2 ---- -1

y
0 1 1~

It must be found in two stages: I


2

O)~J ~(x+2) 2 dx
0

Example 6.4 P(-1 <:X<:


-1 8
The continuous random variable X has p.d.f. f(x) where
k(x+2) 2 -2 <:x< 0
~ 2~ [(x + 2)J]o, 1~

!
-2 0 1

f(x)~ 4k 0 <:x<: it 1
~24(8-1)
0 otherwise
7
(a) Find the value of the constant k.
24
(b) Sketchy~ f(x).
(c) Find P(-1 <:X<: 1). and P(O ,;;; X,;;; 1) ~area of rectangle
(d) Find P(X > 1). 1
2
Solution 6.4 7 1 19
.. P(-1<:X<:1)~-+-~-
2 24
~1
24
(a) To find k, you need to use the result J
allx
f(x)dx
(d) From the diagram,
Y Area= ~x~=~

f(x) has been given in two parts, so you will need to calculate two separate integrals, as
follows:
P(X > 1) ~area of shaded rectangle ~11t
I I
1 1 I 11

L0 k(x + 2) 2 dx + J''
0
' 4kdx ~ 1
.. P(X>1)~G
=-X-
3
1
2
-2 0
I
I
I2
1J,

Hcx+2)'L +4+r ~1
~ (8) +4k(i) ~ 1 leu litiC:S
8k~ 1 1. The continuous random f(x) 3. The continuous random variable X has p.d.f.
1 variable X has a p.d.f. f(x) where f(x) ~ k(4- x), 1 ~ x 0.
k~- f(x) where f(x) ~ kx 2 ,
8 O~x.;2.
(a) Find the value of the constant k.
(b) Sketchy~ f(x).
(a) Find the value of the (c) Find P(1.2 ~X~ 2.4).
constant k.
(b) Find P(X #1). 0 2 4. The continuous random
f~)
(c) Find P(0.5 ~X 0.5). variable X has p.d.f. f(x)
where f(x) ~ k(x + 2) 2 ,
2. The continuous random variable X has p.d.f. O<x.-;;;2.
f(x) where f(x) ~ k, -2.; x ~ 3.
(a) Find the value of the
(a) Sketchy~ f(x). constant k.
(b) Find the value of the constant k. (b) Find P(O ~X~ 1) and
(c) FindP(-1.6~X~2.1). hence find P(X> 1). 0 2

__ ___::
320 ,t, CONCISE COUf\SE II' I /-'.-1 [\/fL_ ST,t.T!Sr!CS

If f(x) has a line of symmetry in the specified range, then E(X) can be found directly as in the
7. The continuous random variable X has p.d.f.
5. The continuous random t(xl followmg example.
f(x) where
variable X has p.d.f. f(x)
where f(x) = kx 3 , 0 ~x ~ c 0 <;x< 2
and P(X ~ !) = -/6.
Find the values of the
f(x) ~ l~(2x- 3) 2 ~X~ 3
otherwise
Example 6.6
A continuous random variable X has p.d.f. f(x) where ·
constants c and k.
0 c (a) Find the value of the constant k. 0.25x 0 <x< 2
(b) Sketchy~ f(x).
(c) Find P(X.;; 1). f(x) ~ 1 - 0.25x 2 <x < 4
6. A continuous random variable has p.d.f. f(x) (
where f(x) ~ kx, 0 <; x <; 4. (d) Find P(X > 2.5). 0 otherwise
(e) Find P(l<; X.;; 2.3).
(a) Find the value of the constant k. Sketchy~ f(x) and find E(X).
(b) Sketchy~ f(x).
(c) Find P(l <;X<; 2.5).
Solution 6.6
Sketch of y ~ f(x)
EXPECTATION OF X, E()O f~)

For a continuous random variable \'lith p.d.L


f(x) = 0.25x
From the sketch, you can see that there is
symmetry about x ~ 2.
xf(x)dx

E(X) is referred to as the mean or expectation of X and is often denoted by I'·


0 2

Example 6.5 Check by integration:


2
The sketch shows the p.d.f. of X where f(x) ~ ~x , 0 <: x < 3.
f~)
E(X) ~I xf(x )dx
(a) Find ~-'• the mean of X. allx

(b) FindP(X<!'). ~
J
2
x x 0.25xdx + L4
x x (1 - 0.25x)dx

Solution 6.5
(a) I'~ E(X) 0 3
0

~ 0.25 I: x 2
dx + r(x - 0.25x 2)dx

~J
,ux
xf(x)dx ~ 0.25[3x']20 + [x2
2- 0.25 ~']42

~~+(8-136 -(2-~))
~2

Example 6.7
A teacher of young children is thinking of asking her class to guess her height in metres. The
teacher consrders that the height guessed by a randomly selected child can be modelled by the
random vanable H wrth probability density function
f(h) ~ {-
{, (4h- h") o <h < 2
0 otherwise
Using this model,
(a) find P(H < 1 ),
(b) show that E(H) ~ 1.25.
322 ,t, CCll'ICiSE c:OUH.SL .t.. _;_ iU ;;T/\T!STiCS
T
A friend of the teacher suggests that the random variable X with probability density function
(e) E(X) = J: xg(x)dx
kx 3 0 <; x <; 2
g(x) = {0 otherwise = _1_ Iz x4 dx
4 0
where k is a constant, might be a more suitable model.
(c)
(d)
Show that k = l
Find P(X < 1).
=~[~'J:
= 1.6
(e) Find E(X). . h. h f h d
(f) Using your calculations in (a), (b), (d) and (e), state, givmg reasons,. w lC o t e ran omL (f) For H, P(H < 1) = 0.3125, so 31% of children guess the teacher's height to be less than
variables H or X is likely to be the more appropriate model m this mstance. ( ) 1 m (i.e. 3 ft 3 in).
E(H) = 1.25, so the average guess for height of the teacher is 1.25 m (i.e. 4ft 1 in).
Solution 6. 7 For X,
P(X < 1) = 0.062 55, so only 6% of children guess the height to be less than 3ft 3 in.
(a) P(H < 1) = J: f(h)dh "" sketch of f(h)
E(X) = 1.6, so the average guess for the height of the teacher is 1.6 m (i.e. 5 ft2 in).
X is the more appropriate model.
= J' 2_
0 16
(4h- P)dh

h3]1o
=u3 [2h2-3 0 2 h
6b Expectation

=-=
5
0.3125
1. Find E(X) for each of the following continuous i ~X< 2
random variables.
16 2 <x~ 4
(a) f(x)~i(x 2 +1),0<;x0.
otherwise
(b) E(H) = [ hf(h)dh f(X)
f(x)

=2_J
2
(4P-h')dh
16 0
3
4 2 4
3 [4h' h ]
=163-40 0
0 ~ 2 4 '
= 1.25 (b) f(x)~ix(2-x), 0<;x<;2.
2. The continuous random variable X has p.d.f.
(c)
Ja!lx
g(x)dx = 1 sketch of g{x)
"''~ f(x) where
O~x<1

)
kx
k 1 <x<3
[ kx 3 dx=1
0
f(x)~ k (4-x) 3 <;x~ 4
2 '

k[x44I = 1 """'"""'"' I
(c) f(x) ~ ,\,(6- x), 0 <; x <; 6. l0 otherwise
(a) Draw a sketch of y ~ f(x).
0
~~~~
2 ' (b) Find k.
4k=1 (c) Find E(X).
1 3. X is a continuous random variable with p.d.f.
k=-
4 fix)~ kx 2 , 0 ~ x <; 4.
0 6 ' Find E(X).
l 1
3 (d) f(x)=kx'\O<x<2.
(d) P(X < 1)=
J0 -x
4
dx 4. In a game a wooden block is propelled with a

l~
stick across a flat deck. On each attempt the
= _1_ [x4]1 distance, x metres, reached by the block lies
between 0 and 10m, and the variation is
4 4 0 modelled by the probability density function
1 f(x) ~ 0.0012x 2 (10 -x).
=-=0.0625
16 Calculate the mean distance reached by the
0 2 '
block. (SMP)
The lifetime X in years of an electric light bulb Example 6.8
5. The continuous random variable X has the
probability density function f given by f(x) ""kx, has this distribution. Given that a lamp standard
is fitted with two such new bulbs and that their The continuous random variable X has p.d.f. f(x) where f(x) ~ z'o (x + 3), 0,; x,; 4.
5 < x < 10, f(x)"" 0 otherwise.
failures are independent, find the probability that
(a) Find the value of k. (a) Find E(X).
neither bulb fails in the first year and the
(b) Find the expected value of X. probability that exactly one bulb fails within two (b) Find E(2X + 5).
(c) Find the probability that X> 8. years. (MEI) (c) Find E(X 2 ).
The annual income from money invested in a (d) Find E(X 2 + 2X- 3).
Unit Trust Fund is X per cent of the amount 8. The mass, X kg, of a particular substance
invested, where X has the above distribution. produced per hour in a chemical process is a
Suppose that you have a sum of money to invest continuous random variable whose probability Solution 6.8
and that you are prepared to leave the money density function is given by
invested over a period of several years. State, Sketch of y ~ f(x). y
3x 2
with your reasons, whether you would invest in f(xl~- O<x<2
the Unit Trust Fund or in a Money Bond offering 32
i\(11\: fmn1 dh: ?kctc·h tint
a guaranteed annual income of 8% on the money 3(6-xl
invested. (NEAB) f( xl- -----u- 2<x<6 dwrc i~ IHl 'J'Jlllllcrry.

6. The lifetime X in tens of hours of a torch battery f(x) = 0 otherwise


0 4
is a random variable with probability density (a) Find the mean mass produced per hour.
function (b) The substance produced is sold at £2 per
(a) E(X) ~ Lx xf(x)dx
r
j(l-(x-21 21 1 ~x~ 3, kilogram and the total running cost of the
f (x) = 0 otherwise process is £1 per hour. Find the expected
~
\ profit per hour and the probability that in 1
2 0 x(x + 3)dx

r
Calculate the mean of X. an hour the profit will exceed £7. (NEAB)
A torch runs on two batteries, both of which 9. A continuous random variable X has the
~ 20
have to be working for the torch to function. If probability density function f defined by 1 2
two new batteries are put in the torch, what is (x + 3x)dx
the probability that the torch will function for at
least 22 hours, on the assumption that the life-
times of the batteries are independent? (0 & C) 3 <x< 4 ~ 210 [~\ 3~'1:
otherwise ~ 2.266 ...
7. A random variable X has a probability density
function f given by where c is a positive constant. Find ~ 2.3 (2 s.f.)
cx(S-xl O~x<S (a} the value of c, (b) E(2X + 5) ~ E(2X) + 5 (Result 3)
f (xl~ 0 otherwise (b) the mean of X,
\ (c) the value, a, for there to be a probability of ~2E(X)+5 (Result 2)
6 . 0.85 that a randomly observed value of X ~ 2(2.266 ... ) + 5
Show that c""- -and fmd the mean of X. will exceed a. (NEAB)
125 ~ 9.533...
~ 9.5 (2 s.f.)

(c) E(X
2
) ~J x
2
f(x)dx

r
THE EXPECTATION OF ANY FUNCTION OF X allx

~ 20
1
is any function of the continuous random tbcn 2
x (x + 3)dx
lf

~ 20
1
r 3
(x + 3x )dx
2

ln particular ~ 210 [:\x'J:


As in the case of the discrete random variable (see pages 246 and 248), the following results ~6.4

also hold when X is continuous; a and b are constants: (d) E(X 2 + 2X- 3) ~ E(X 2 ) + E(2X)- E(3) (Result 4)
~ E(X 2 ) + 2E(X) - 3 (Results 1, 2)
1. E(a) " a
2. EiaXI aE(X)
~ 6.4 + 2(2.266 ... ) -3
~ 7.933 ...
3. E(aX +b)~ aEIX) + b
~ 7.9 (2 s.f.)
4. E(g(X) +
Example 6.10
Example 6.9
The continuous random variable X has p.d.f. f(x) where
The mass, X kg, of a particular substance produced in one hour in a chemical process is

I0
modelled by a continuous random variable with probability density function given by ~X 0 <x< 1
f(x)~ -~x(2-x) 1 <:x<: 2
f(x) ~ f,_ x 2, 0 <: x < 2,
f(x)~f,.(6-x), 2 <:x<: 6, otherwise
2
f(x) ~ 0, otherwise Find E(X ).

(a) Sketch the graph of f.


Solution 6.10
(b) Find P(X < 4).
~J
(c) Find the mean mass produced per hour. 2
(d) The substance is sold at £100 per kilogram and the running cost of the process is £20 per E(X 2) x 2 f(x )dx
allx
hour. Taking £Y as the profit made in each hour, express Y in terms of X.
6 J2 -x
6
(e) Find the expected value of Y. (NEAB)
I
l
~ - x 3 dx + 3
(2- x)dx
0 7 1 7
6 ~ J2 (2x
1
Solution 6.9 6
~- x 3 dx +- 3
-x 4 )dx
7 0 7 1
(a) "''

~~[:·J: +~[~·- ~T
~ 1.328 ...
~ 1.3 (2 s.f.)

r r
0 1 2 3 4 5 6

(b) P(X < 4)~ 2_x 2 dx+ 2_(6-x)dx


NOTE: E(X 2 ) is an important value which is needed when calculating the variance of X.

~ftx']2 +2_2[6:2_ x2]4 VARIANCE OF X, Var()O


32 ° 32 2 2
8 3
~-+-(24-
32 32
8-(12 -2)) For a random variable X,
Var(X) ~ E(X- 11) 2 where I"~ E(X).
13
16 As in the discrete case (see page 249) the formula can be written:
Var(X) ~ E(X 2)- E 2(X)
(c) E(X)~
2

J
0
-3 x 3 dx+ -(6x-x
32
3
2 32
2
J'
)dx ~ E(X2) _1"2

~ :2 [x4 J: + :2 [3x x3'I


4
2 !S continuow, random -variable \\'ith then
-

~%+ :2 (108-72-(12-~))
,_.,y·bcrc
7
~2-
8 l'hv sLmdard deviation of ls often \vrittcn a3 a, sou
(d) Y~100X-20
As in the case of the discrete random variable (see page 250) the following results also hold
(e) E(Y) ~ E(100X- 20) when X is continuous; where a and b are constants '
~ 1OOE(X) - 20
~267!
So the expected profit is £267.50. l.
328 /" COi'JC!SF: COUF.SE iN ,t.-LE\IU_ Sf-,6-TISTiCS
T
Example 6.12
Example 6.11
The continuous random variable X has p.d.f. f(x) where f(x) = ix, 0 <; x <; 4 · Find As an experiment a temporary roundabout is installed at the crossroads. The time, X minutes,
which vehicles have to wait before entering the roundabout has probability density function
(a) E(X), 0.8- 0.32x 0 <;x <; 2.5
(b) E(X 2 ), f(x) = .
(c) Var(X),
0 1 otherwtse
(d) a, the standard deviation of X, Find the mean and the standard deviation of X. (AEB)
(e) Var(3X + 2).
Solution 6.12

Solution 6.11 E(X) = J xf(x )dx


y allx

(a) E(X) = Lx xf(x)dx


2.5
(0.8x- 0.32x 2)dx
=
J
0

=J· ~x'dx
0 8 = [o.8 x'- 0.32 x']'·'
2 3 0

=H~l
0 4
= 0.833 ... minutes
=50 seconds
= 2.666 ... The mean time is 50 seconds
= 2.7 (2 s.f.)
E(X 2 ) = J x 2 f(x )dx
(b) E(X 2 ) =J
allx

=J· ~x'dx
2
x f(x )dx
= rallx
5
(0.8x 2 - 0.32x 3 )dx

0 8
= [0.8 ~, _ 0.32 :T'
=H:l
1
=1.041 ...
Var(X) = E(X 2 ) - E 2 (X)
=8(64)
= 1.041 ... (-0.833 ... ) 2
=8 = 0.347 .. .
(c) Var(X) = E(X 1 ) - E 2 (X) s.d. of X= ~0.347 .. .
= 8- (2.666 ... )1 = 0.589 ... minutes
= 0.888 ... = 35 seconds (2 s.f.)
= 0.89 (2 s.f.)
(d) a=~Var(X)
= ~0.888 .. . THE MODE
= 0.9428 .. .
= 0.94 (2 s.f.) The mode is the value of X for which f(x) is greatest in the given range of X.
(using variance result 3) To locate the mode it is a good idea to draw a sketch. Sometimes the mode can be deduced
(e) Var(3X + 2) = 9 Var(X)
immediately.
= 9(0.888 ... )
y y y
=8

0 4 0 2 4 0 4
Mode is 4 Mode is 2 Mode is 0
For some probability density functions you will need to determine the maximum point on the Solution 6.14

d
d
curve y ~ f(x) using the fact that, at a maximum point, f'(x) ~ 0, where f'(x) ~ d f(x).
X
(a) Since X is a random variable, I
allx
f(x)dx ~1
Note that a maximum point is confirmed if f"(x) < 0, where f"(x) ~ dx f'(x).
.. 1~J:Ax(6-x) 2 dx

Example 6.13 ~A I: (36x- 12x 2


+ x 3 )dx y
Y = 1 ~ 8 x(6-xl 2

\~
X has p.d.f. defined by f(x) ~ 8\, (2 + x )(4 - x ), for 0 ( x ( 4 and is illustrated in the diagram.
Find the mode. ~A[ 18x 2
- 4x
3
+~ x J: 4

y
~ 108A
""y=fo(2+xl(4-x} 1
A~-
'
jjj
108 0 6 '

(b) f(x) ~ 6
1 8 x(6 -x) 2
0 :;;;x..;;;;; 6
0 Mode 4
(i) The mean is E(X) where E(X) ~I xf(x )dx
allx

- J' x
~ -108
Solution 6.13 1 2
E(X) (6- x) 2 dx
f(x) ~ g\,(2 + x)(4 -x) ~ g\,(8 + 2x- x
2
) 0

The mode is the value of x at the maximum point. ~ 1 ~ 8 J: (36x 2


- 12x 3 + x 4 )dx
Differentiate to find f'(x).
f'(x)~lrJ(2-2x) ~ 1 ~ 8 [12x 3
-3x +
4
~I
.. f'(x)~O whenx~1 ~2.4

Differentiate again to find f"(x) (ii) To find the mode, find the value of x for which f(x) is a maximum, 0 ( x ( 6.
f"(x) ~fax (-2) ~-fa , f(x) ~ 6
1 8 (36x -12x 2 + x 3 )

so f"(x) < 0 for all values of x, indicating that there is a maximum point when x ~ 1. Differentiating
f'(x) ~ 16 8 (36- 24x + 3x 2 )
The mode~ 1 ~16s(6-x)(2-x)
.. f'(x) ~ 0 when x ~ 2 and when x ~ 6
f"(x) ~ 6
1 8 (6x- 24)
Example 6.14
To check maximum or minimum, consider f'(x).
A random variable X has a probability density function
When x ~ 2, f"(x) < 0 and when x ~ 6, f"(x) > 0.
f(x) ~Ax(6 -x) 2 0 ..-:;;x~ 6 f(x) is maximum when x ~ 2, so the mode is 2.
~o elsewhere.
(iii) To find the variance of X, first find E(X 2 ).
(a) Find the value of the constant A.
(b) Calculate E(X2 ) ~I x 2 f(x)dx
(ii) the mode, allx

I:
(i) the mean, (AEB)
(iii) the variance, (iv) the standard deviation of X.
~ 1 ~8 3
(36x -12x +x )dx
4 5

5
~ _1_ [9x 4 _ 12x + x']'
108 5 6
0
~7.2
Var(X) ~ E(X2 ) - E 2 (X) ~ 7.2- (2.4)
~11.44
2
(c) E(t) ~ 1 111
tf(t)dt

O.G Il.O
~10c t 3 dt+9c (t-t 2 )dt
(iv) Standard deviation of X~ ..,Jvar(X) I0 0.6
~ 1.44
~ 1.2
~·-t
10c [ 4]1.o +9c---
4 2
t ll.o
3 0.6
[t 2 3

0.6
~ 0.225 + 0.366 ...
Example 6.15 ~ 0.591 ... hours
~ 35.5 minutes
The time taken to perform a particular task, t hours, has the probability density function
The expected time is 35.5 minutes.
10ct2 0 (t< 0.6

!
f{t)
(d) (i) 48 minutes~ 0.8 hours
f(t) ~ 9c(1 - t) 0.6 ( t ( 1.0
1.0

where c is a constant.
0 otherwise, P(T > 0.8) ~ 9c
I 0.8
(1- t)dt

(a) Find the value of c and sketch the graph of this distribution. ~ 9c[t<]t.o 0 0.6 0.8 1
(b) Write down the most likely time. 0.8
~ 0.125
(c) Find the expected time. . .
(d) Determine the probability that the lime will be The probability that the time will be more
(i) more than 48 minutes, than 48 minutes is 0.125. f(t)

(ii) between 24 and 48 minutes. (ii) 24 minutes~ 0.4 hours


P(0.4 < T < 0.8)

P(0.4 < T < 0.8) ~ 1- P(T> 0.8)- P(T < 0.4)


Solution 6.15
0.4

(a) 1~ J f(t)dt
P(T<0.4)~10c
10
t 2 dt 0 0.4 0.8 1

~ 1~c [~3r
all t
Jl.O
~1ocJ.
06
t 2 dt+9c (1-t)dt
0 0.6
~0.1481 ...
~ 1~c [i']:' +9c[t<]l.O P(0.4 <T< 0.8)~1-0.125-0.1481...
0.6
~ 0.72c+ 0.72c ~ 0. 727 (3 s.f.)
~ 1.44c The probability that the time will be between 24 and 48 minutes is 0. 72 7.
1 100 25
c~ 1.44 ~ 144 ~ 36
t{t)
6c Standa deviation a va nee
The probability density function is In Questions 1-7 find
lfit 2
o ( t < o.6 2.5
(b) E(X 2 ),

!
(a) E(X), (c) Var(X), (d) the standard deviation of X.
f(t) ~ L,f (1- t) o.6 q.;, 1.0 It is assumed that the value of the function is zero outside the range(s) stated. Do not forget to look for
symmetry when considering E(X).
0 otherwise
0 0.6 NOTE: some of these functions were given in Exercise 6a and you may wish to refer to your previous sketches.
1. f(x)~jx' O<x<2
f(x)~li(2x-3)
(b) From the sketch, t ~ 0.6 gives the maximum value of f(t). 6.
0 <x< 2
2. f(x) ~ ~ -2,;;:; X,;;;; 3 2 ..;;x,.;; 3
Therefore the mode is 0.6 hours~ 36 minutes. 3. f(x) ~ l(4 -x) 1..-;;x..-;;3

The most likely time is 36 minutes. 4. f(x)~i!;(x+2) 2 O..-;;x..-;;2 7. f(x) ~ wx + 2)' -2 ..;;x,.;; 0
0,.;; X,.;; 1!
5. f(x) ~4x 3 O..-;;x..-;;1
T f--'f(_();rw

11. A continuous random variable X has a


Remember that F(t) gives the area under the f~)
8. A continuous random variable X has p.d.f. curve f(x) up to a particular value t.
probability density function f given by
f(x)~kx ,0~x~4.
2

(a) Find the value of k, and sketchy~ f(x).


(b) Find E(X) and Var(X).
(c) Find P(l <X< 2).

9. A continuous random variable X has p.d.f. f(x)


f(x) ~ x(4- x)
f(x) ~ 0
k
1 ~X~

otherwise
2
3
Notice that

~
~1
r
F(b) ~ P(X <: b)

f(x)dx
a b '
I

where (a) Show that k ~ ];3.


(b) Calculate the mean and the variance of X. This is as expected, since the total area under the curve is 1.
kx O~x<l
(NEAB)
f(x)~ k(2-x) l~x~2
{0 otherwise 12. The probability density function of X is given by Using F(x) to find P(x 1 .;; X.;; x 2 )
2
Find X -\k(ax-x ) O...;;x,;;;;2
(a) the value of the constant k, {( ) - 0 X < 0, X > 2
(b) E(X), (c) Var(X),
(d) P(l ~X~ lj), (e) the mode. where k and a are positive constants.
3
10. The continuous random variable X has p.d.f. Show that a:,;;. 2 and that k= a _ -
given by f(x) where 6 8
x,
Given that the mean value of X is 1, calculate the
fyx 2 O<x<3
values of a and k.
f(x)~! 3~x~5
{0 For these values of a and k sketch the graph of
otherwise the probability density function and find the~
P(X<:x,) ~F(x 1 )
(a) Sketchy~ f(x). variance of X. (NEAB)
(b) Find E(X).
(c) Find E(X 2 ). 13. A continuous random variable X has probability
(d) Find the standard deviation a of X. density function f(x) defined by
12(x 2 -x 3 ) 0 ~ ~1
f(x) = 0 otherwise
\
Find the mean and standard deviation of X.
(O&C)

THE CUMULATIVE DISTRIBUTION FUNCTION, f(x) Finding the median, quartiles and other percentiles
In Chapter 4 (page 253) you met the idea of a cumulative distribution function, F(x), for a The median is the value 50% of the way through the distribution. It splits the area under the
discrete random variable and in Chapter 5 (pages 283 and 294) you used cumulauve curve y ~ f(x) into two halves. If m is the median, then for f(x) defined for a.;; x < b,
probability tables giving F(r) ~ P(X <: r) for binomial and Pmsson dtstnbut10ns.
In the same way, if X is a continuous random variable with p.d.f. f(x), you can find tbe
cumulative distribution function F(x).
For a particular value, t, in the range of the function,

F(t) ~ P(X <: t) ~ L f(x)dx.

The lower limit is given as -oo, but in practice it is the smallest possible value of X in the range
for which x is valid.
f(x)
So if is valid in the range a < x < bJ

then F(t)

a b '
336 ,L\ CCiNCiS[ COUf?.SE: lf\i f..-l_[v'EL_ ST;\TiSTiCS

The cumulative distribution function can now be writt . f f II


Y Sketch of Y ~ :~x:n terms o x as o ows:
Note that if f(x) is symmetrical in the given range, the mean and median will coincide. 0 x ,;; 0
The lower quartile, q , is the value 25% of the way through the distribution, so x2 F(x) = 1
1
F(x) ~ - 0 <: x,;; 4
-, dx 0.25 16 Fix)~~
~4
co
" '::,! 16
11 X~
o+-~~---4L_ ____'
l.C. 0.25
F(q 1) "'

The upper quartile, q , is the value 7 5% of the way through the distribution, so (b) P(O.hX<:1.8)~F(1.8)-F(0.3)
3

1 82
((x)dx ··-· 0.75 F(1.8) ~--ii;-~ 0.2025
0 32
l.C. •• 0.75 F(0.3) ~ ~
6 ~ 0.005 625
Similarly for other percentiles, for example
P(0. 3 ,;; X,;; 1.8) = 0.025-0.005 625 = 0.197 (3 d.p.)
F(10th percentile)= 0.1 and F(35th percentile)= 0.35.
(c) For the median m , Y Sketch of y "" f{x)

In general F(m) =0.5


n m2
F!nth percentile) •• J()()
1.e. 16=0.5
m2 ~ 8 m"" 2.83 4 x
m ~2.828 ...
Example 6.16 The median m = 2.83 (2 d.p.).
f{x)
X is a continuous random variable
NOTE: take the positive square root, since 0.::.:;; m ~ 4.
with p.d.f. as shown.
f(x) = ~x q,Z
f(x) = i X, 0 (X ( 4 (d) F(q 1 )=0.25 .. =0.25
16
q,'=4
0 4
Find q, =2
(a) the cumulative distribution function F(x) and sketchy= F(x), 2

(b) P(0.3 <:X<: 1.8),


F(q 3 )~0.75 .. '!2_~07
16 . 5
(c) the median, m, q,2 = 12
(d) the interquartile range.
q,~m~3.464 ...
Interquartile range= q 3 _ q, ~ 3.464 ... _ 2 = 1.S (2 s.f.).
Solution 6.16
(a) For values oft between 0 and 4,
Example 6.17
F(t) = J' ~ xdx X is a continuous random variable with p ·d ·f · f( x ) w h ere
0 8 _('

y=-~ + 2

=l~:I
_H4) X y
16 0 ~X~ 2 2 3
3 3
r=} )
2x
i" f(x) ~ --+2 2 ~X~ 3
3
16
0 otherwise 0 2

(ab)) FFindd the cumulative distribution function F(x) and sketch it


( m P(1 <:X,;; 2.5). ·
(c) Find the median, m.
I
(b) P(l<X<2.5)=F(2.5)-F(l)
Solution 6.17 To find F(2.5), use F(x) in the interval2 < x < 3.
x2
(a) F(t) = J~ f(x)dx F(x) = --+ 2x- 2
3
Since f(x) is given in two parts, F(x) must be found in two stages. (2.5) 2
3
F(2.5)= -~-+2(2.5) -2
First consider t where 0 < t,;::; 2.
y 11
F(t) = J' "_3 dx
0
12
To find F(l), use F(x) in the interval 0 < x < 2.
=[x:I F(x)=
x2

tl 0 I t 2 3 ' 6
6
x2 y
F(1) = ~
So, for 0 < x < 2, F(x) =6 P(1 <X< 2.5) = F(2.5)- F(l)
11 1
NOTE: F(2)=t=i
12 6
0 2 = 0.75

(c) F(m) = 0.5, where m is the median.


Now consider t such that 2 < t < 3.
Since F(2) = 1, the median must be less than 2, so consider F(x) in the range 0,;;; x,;; 2.
2
F(t)=F(2)+ [ (- ; +2)dx m2
Therefore

'1
F(m)=G

=F(2)+[- x; +2xI So
m2
6=0.5
0

=~+{- ~ +2t-(-i+4)1
m 2 =3
i;' ~, m = 1.73 (2 d.p.)
tl --;'
' h
=--+2t-2
3 ··~ Cl
:::::,\___, Cumulative clistri on
Writing the answer as a general formula in terms of x,
Sketch of y = F(x) 1. The random variable X has probability density 3. The random variable X has probability density
x2 function function
0 <x< 2 y
y =
x' 2x-2
-3+
6 =I f(x)=ix'. O<:x<:2 f(x)=k, 1 <:x<; 6
x2 Find (a) Find k.
F(x)= --+2x-2 2<x<3
3 (b) Find the cumulative distribution function
'3 (a) the cumulative distribution function, F(x)
F(x).
x>3 x' and draw a sketch of y = F(x),
1 1
y=6 (b) the median, m. (c) Find the 20th percentile.
3 (d) Find the interquartile range.
2. The random variable X has probability density )
0 -~
function 4. The random variable X has probability density
0 2 3 ' function
f(x)=h4-x), 1 <:x<; 3
0 <x< 2
Find f(x)=j;(2x-3) 2 <x< 3
(a) the cumulative distribution function F(x),
(b) P(l.S <X< 2). Find
(a) the cumulative distribution function F(x),
(b) the median m.
I

15. A continuous random variable, X, has 16. The continuous random variable X has
5. The random variable X has cumulative 11. A factory is supplied with flour at the beginning probability density function given by
of each week. The weekly demand, X thousand probability density function given by
distribution function tonnes for flour from this factory is a

~ ~~
f(x) ~ax- bx' for Oo;;;:xo;;;:2
0 X< 0 contin~ous random variable having the forl<xo;;;:9,
~o elsewhere
probability density function f(x)
F(x)~ x 4 0 <:x<: 1 Observations on X indicate that the mean is 1. otherwise
{1 X~ 1 f(x)~k(1-x)', 0 "x" 1 (a) Obtain two simultaneous equations for a
where k is a constant. Giving your answers
f(x) ~ o, elsewhere and b, show that a= 1.5 and find the value
Find correct to three significant figures where
of b.
Find appropriate, find
(a) P(0.3 <X< 0.6), (b) Find the variance of X.
(b) the median m, (a) the value of k, (c) If F(x) is the probability that X~ x find F(x) (a) the value of k, and also the median value
(c) the value of a such that P(X >a)~ 0.4. (b) the mean value of X, . and verify that F(2) ~ 1. of X,
(c) the variance of X, to three dec1mal places. (d) If two independent observations are made (b) the mean and variance of X,
6. The continuous random variable X has p.d.f. on X what is the probability that at least (c) the cumulative distribution function, F, of
Sketch the probability density functi?n. X, and sketch the graph of y ~ F(x). (C)
f(x) ~ j, 0 <; x <: 3. Find Find, to the nearest tonne, the quantity of flour
one of them is less than!?
(a) E(X), (b) Var(X), that the factory should have in stock at ~he
(c) F(x) and sketchy~ F(x), (d) P(X ~ 1.8), beginning of a week in order that there IS a
(e) P(1.1<:X"1.7). probability of 0.98 that the demand in that week OBTAINING THE P.D.F., f(x), FROM THE CUMULATIVE DISTRIBUTION
will be met. (L)
7. X is the continuous random variable with pk.d.fd. FUNCTION f(x)
f(x) = kx 2 , 1.;;;: x.;;;: 2. Find (a) the co?-st~nt an 12. A continuous random variable X has probability
sketchy= f(x), (b) the standard dev1.at10n a, density function, f, defined by Since F can be obtained by integrating f, it follows that f can be obtained by differentiating F.
(c) the cumulative distribution funct10n F(x),
(d) the median, m. f(x)~~. 0 "x" 1 d
x' 1 o;;;:xo;;;: 2 dx
f<xJ~
8. The continuous random variable X has
probability density function f given by
5.
f(x) ~ 0, otherwise
NOTE: the gradient of the F(x) curve gives the value of f(x).
k(4-x 2) forO"x"2
Obtain the distribution function and hence, or
f(x) ~ ( 0 otherwise
otherwise, find, to three decimal places, the
median and the interquartile range of the Example 6.18
where k is a constant. Show that k = -f6 and find
the values of E(X) and Var(X). distribution (L)
The continuous random variable X has cumulative distribution function F(x) where
Find the cumulative distribution function of X,
and verify by calculation that the median value 13. The continuous random variable X has 0 x~O

J
of X is between 0.69 and 0. 70 probability density function f given by
x' _)~~~~~-~-
Find also P(0.69 <X< 0.70), giving your answer -3o;;;:xo;;;:3 F(x)~ 0 ~X~ 3
k(x + 3),
correct to one significant figure. (C) f(x) ~ 27
otherwise
\ 0'
9. The continuous random variable X has 1 x)3
where k is a constant.
continUous p.d.f. f(x) where 0 3 '
(a) Showthatk=fs.
X 2 (b) Find E(X) and Var(X). (a) State the range of values for which the probability density function f(x) is valid.
2"x"3 (c) Find the lower quartile of X, i.e. the value q
3 3 (b) Find f(x) and illustrate it in a sketch. · ·
3 ,;;;xo;;;: 5 such that P(X" q) ~ ~·
f(x) ~ a (d) Let Y =aX+ b, where a and bare constants
2 -f3x 5 o;;;:x,;;; 6 with a > 0. Find the values of a and b for Solution 6.18
\
otherwise which E(Y) ~ 0 and Var(Y) ~ 1. (C)
0
(a) Since F(x) is unchanging in the regions x ( 0 and x ;> 3 it follows that f(x) must be zero for
Find (a) a and j3, (b) F(x) and sketchy~ F(x), 14. The continuous random variable, X, has x < 0 and x) 3.
(c) P(2 <;X" 3.5), (d) P(X ~ 5.5). probability density function defined by
So f(x) is valid for 0 ( x ( 3 and f(x) ~ 0 otherwise.
10. The continuous random variable X has ~kx, 0 ,;;;xo;;;: 8
d
probability density function f(x) ~ l~k, 8 <xo;;;: 9 (b) f(x) ~ dx F(x)
p+x otherwise

~ ~ (~;)
1o;;;:xo;;;:3
f(x)~\o 6 where k is a constant.
otherwise
(a) Sketch the graph of f(x).
(a) Sketch the probability density function of X. (b) Show that k ~ 0.025. 3x 2
(c) Determine, for all x, the distribution
(b) Calculate the mean of X. 27
(c) Specify fully the cumulative distribution function F(x). d x2
(d) Calculate the probability that an obse;;~AB)
function of X.
(d) FindmsuchthatP(X"m)~1. (L) value of X exceeds 6. ( 9
The p.d.f. for X is f(x) where f{xl
,, Example 6.20
g

r
f(x) =
I The continuous random variable X has cumulative distribution function given by
O<:x<:3 I
f(x) = ; I x<O
I
otherwise 0 .;;;;;x< 1
0 3 ' x>1
(a) Show that P(X < !l = l
(b) Find the interquartile range of X. (C)
Example 6.19 .
The continuous random variable X has cumulative distribution function F(x) as shown m the Solution 6.20
sketch. (a) P(X < !l=F(!)=2x!-C!l 2 =0.75
F~l

0 x< -2 (b) To find the interquartile range, you need to find the upper quartile and lower quartile.
rz(2+X) -2 <:x< 0 Upper quartile q 3 is such that F(q 3 ) = 0.75.
F(x)= ~(1 +x) 0 <:x< 4 From(a) F(!)=0.75
4<:x<6 :. q3 =!
,',:(6+x)
1 x>6 Lower quartile q 1 is such that F(q 1) = 0.25

' 2 4 6 F(q 1) = 2q 1 - q/
-2 0
,', 2q1- q/ = 0.25
(a) Find the p.d.f. of X, f(x), and sketchY= f(x).
2
(b) Find E(X). q1 -2q1 +0.25=0
(q1 -1) 2 -1 + 0.25 = 0
Solution 6.19 (q1 -1) 2 = 0.75
(a) Since F(x) is unchanging for x < -2 and ~
x"" · fo11ows that f(x) must be zero for x < -2
6, 1t
q1-1= ±>/0.75
and x :> 6.
d So q 1 = 1 +>10.75 or q 1 = 1->/0.75
Since f(x) = dx F(x ),
Since F~x) is unchanging for x > 1, f(x) = 0 for x > 1.
d 1 1 So 1 + 0.75 is outside the range of f(x).
for -2 <:x< 0, f(x)= dx 12 (2+x)= 12 " q1=1->/0.75=0.1339 ...
d 1 1 Interquartile range = q 3 - q 1
for 0 <:x< 4, f(x)= dx 6(1+x)=6
d 1 1
= 0.5-0.1339 ...
for 4<:x<6, f(x)=dx C6 +x)=u = 0.37 (2 s.£.)
12

The sketch of y = f(x) is shown:


y

y=l se f(x)
y=
-~. -·~.- '
. . ~-I
l y =
1
I2
:==t 1. The cumulative distribution function of X is
' ' given by
-2 0 4 6 '

(b) Since f(x) is symmetrical, E(X) = 2.


I ,

0 2 4 6 '
344 /' CCNClSE: COUF6L iN fl, L. E\·T_\_ STAi!STiCS

THE CONTINUOUS UNIFORM (OR RECTANGULAR) DISTRIBUTION


(a) Find the probability density function f(x). 5 . The continuous random variable X has
cumulative distribution function F(x) where
(b) Sketchy~ f(x). x<1 Consider the continuous random variable X with probability density function
(c) Find E(X). 0
(d) Find the interquartile range. (x -1) 2 = k for 1 ,;;; x,;;; 6.
f(x)
1 <x< 3
2. The cumulative distribution function of X is
12 f(X) Since the total area under the curve is 1
F(x)~
2 k
'
given by (14x-x -25) 3 <x< 7 Sk= 1
x<:O 24

F(x) ~ \~
3 0 <x< 1
1 x;;;:.7 k=0.2
f(x) = 0.2, 1 ....;;x....;; 6
x>1 Find
(a) the p.d.f. f(x) and sketch it, 0 6 '
Find (b) E(X) X is said to follow a continuous uniform, or rectangular, distribution between 1 and 6.
(a) the median, (c) Var(X),
(d) the median of X, This can be written X - R(1, 6).
(b) the mean. . . .
3. The cumulative distribution functiOn of X 1s (e) P(2.8 <:X" 5.2).
In general:
given by
6. The continuous random variable X has
x<O The probability rtnrmh m

~ \~-
(cumulative) distribution function given by
0 <x< 2 the range a < x < b is
F(x) kx' 1+X
-1 <x< 0 1
y f(x) =If-
x;;;:.2 8
f -- - §l.
b-a
''
Find 1 + 3x ''
F(x) ~ 0 <x< 2
(a) the value of k, . 8
··rhi':> Is \Vrittcn X" b.! ''
(b) the probability density functlon f(x), 5+x '
(c) the median of X, 2 <x< 3
8 a and b arc kno\vn as the parameters of the distribution. 0 a b
(d) the variance of X.
where F(x) = 0 for x < -1, and F(x) = 1 for x > 3.
NOTE: It is easy to see from the diagram that the total area is 1.
4. The continuous random variable X has (a) Sketch the graph of the probability density
cumulative distribution function F(x) where
function f(x).
0 x<O (b) Determine the expectation of X and the 1
2x
variance of X. I1 Area= (b -a) x - -
(b -a)

F(x)~
3
:._+k
0

1
<x<

<x< 2
1 (c) Determine P(3 <: 2X" 5).

7. A continuous random variable X takes values in


(C)
bt =1
the interval 0 to 3. It is given that 'f---b-a~
3
P(X>x) =a+ bx 3 , 0 <x < 3.
1 x;;;:.2
(a) Find the values of the constants a and b.
Find (b) Find the cumulative distribution function Example 6.21
F(x). . .
(a) the value of k, (c) Find the probability densrty function f(x). X is distributed uniformly, where 6 <; x <; 9.
(b) the p.d.f. f(x) and sketch it,
(d) Show that E(X) ~ 2.25.
(c) the mean ,u, Find P(7.2,;;; X,;;; 8.4).
(d) the standard deviation a. 8. The length X of an offcut of wooden planking is
a random variable which can take an~ ~a1ue up
to 0.5 m. It is known that the probab1hty o~the
length being not more than x metres (0 < x""' 0. )
5 Solution 6.21
is equal to kx. Determine f(x)
1 1 1
1
3 ; f(x)= b-a = 9-6=3
(a) the value of k,
(b) the probability density function of X,
(c) the expected value of X, P(7.2 <;X<; 8.4) = i(8.4 -7.2)
(d) the standard deviation of X (correct to thr~) :
significant figures). ( =0.4
0 6 7.2 8.4 9 X
!;-; JC;
!_

Example 6.24
Example 6.22
The error, in grams, made by a greengrocer's scales may be modelled by the random variable
The lengths of metal rods are measured to the nearest 5 mm. What is the distribution of the
X, With probability density function '
random variable E, the rounding error made when measuring? Give its probability density
f(x) = (0.1 -3 <: x <: 7
function f(e).
0 otherwise.
Find the probability that
Solution 6.22
(a) an error is positive,
The error is the difference between the true length and the recorded length after rounding to
(b) the magnitude of an error exceeds 2 grams (i.e.! X I> 2),
the nearest 5 mm. (c) the magnitude of an error is less than 4 grams (i.e. I X 1 < 4). (AEB)
Suppose you have recorded a length to be 7 5 mm, to the nearest 5 mm. The true length could
have been any length in the interval
Solution 6.24
72.5 mm <: I < 77.5 mm
(a) "''" 0.1
r--+·~~~---,
So the error, E, could be anywhere in the interval-2.5 <: E < 2.5.
'' P(X> 0) = 7 X 0.1 = 0.7
All points in this interval are equally likely 'stopping places' for E, so E is uniformly ''
distributed in the interval, i.e. -3 0 7 '

E- R(-2.5, 2.5)
(b) (()()
O.I P(i X I> 2) = 1- P(i X I< 2)
1
f(e) =1-P(-2<X<2)
2.5- (-2.5)
1 -3-2 0 2 = 1-4 X 0.1
=- -2.5 <e<:2.5 7 '
5' = 0.6
-2.5 2.5 '
(c)
f'"~-~·
"''
0.1 P(iXI <4) =P(-4 <X <4)
' Since f(x) = 0 when x < -3, find P(-3 <X< 4).
Example 6.23 P(-3<X<4)=7x0.1
Rosie spins a 'Spinning Jenny' at a fair. When the wheel stops, the shorter distance of an -3 0 4 7 '
arrow measured along the circumference from Rosie is denoted by C. What is the distribution = 0.7
of C? So P(IXI < 4) = 0.7

Solution 6.23
EXPECTATION AND VARIANCE OF THE UNIFORM DISTRIBUTION
All the points on the circumference are equally likely
stopping places for the arrow, so C is uniformly
distributed between 0 (when the arrow is next to Rosie)
and nr (when the arrow is diametrically opposite Rosie).
(j Rosie
Example 6.25
The continuous random variable Y has a rectangular distribution
1 n n

!
So C- R(O, nr) - --<y.;;;;;-
1 f(y) = n 2 2
I···~·~·~ f(c)=--
nr-0 0 otherwise

~
((c)

1 (a) Find the mean of Y.


O<c<.:n:r.
1(f ' (b) Find the variance of Y. (L)
0 :n:r c
Solution 6.26
Solution 6.25
By integration: Diagrammatically:
Sketch f(y).
(a) If5<:t<:9 f(x)
1 f(x}=i
By symmetry 4
E(Y) ~ 0 F(t) ~ J: f(x)dx
The mean of Y is 0. 1 t

0 "
2
y
I
~ s 4dx
5 9

~[~xI
1
F(t) ~ (t-5) X-
(b) To find Var(Y), find E(¥ 2
) first 4
t 5 t- 5
2
E(Y ) ~I 2
y f(y)dy Var(Y) ~ E(Y2) - E 2 (Y) 4 4 4
ally n2 t-5
~--0
12 4

l
~-5
n2 X< 5
12
,i' So F(x)~ : 5 <x< 9
The variance of Y is - .
12 x;;;::9

[n for X distributed unifo·•·m11v a:-::; x < h,


() x <a
f(x) = 1_
b -a
~-~~~··~,

12 I I
I I b-a
It is possible to write the mean and the variance of a uniform distribution in general formulae. ) x>b
a b t
If the continuous variable X is uniformly distributed over the interval (a, b), then F(x) can be illustrated diagrammatically.
X- R(a, b) y
f{x) = _bl
1 .~-r-·~-'
By symmetry b-'

+ /;)
It can also be shown that

Ex ere Uniform distribution


THE CUMULATIVE DISTRIBUTION FUNCTION, F(x), FOR A 1. X follows a uniform distribution with 2. X is distributed uniformly, -5,;;;; x,;;;; -2.
probability density function
UNIFORM DISTRIBUTION f(x) ~ k, 3~x ~ 6.
Find
(a) P(-4.3 <X< -2.8),
Find (b) E(X),
Example 6.26 (a) k, (c) the standard deviation of X.
(b) E(X),
X has probability density function f(x) ~ t, 5 <: x <: 9. Find F(x). (c) Var(X),
(d) P(X>5).
350 r. CC!i\!CiSE CCXJRSE: lf-J .£\ l_[\IFI.. :::r,r'.-i"iST!C3

6. X has cumulative distribution function


3. The continuous random variable X has p.d.f.
f(x) as shown in the diagram: x-2
F(x)~-, 2 <x< 7 Summary
y
5
Find ® For a continuous random variable X, with p.d.f. f(x) for a.;; x .;; b
l
y= f(x)
(a) E(X),
4 (b) Var(X). f(x)dx= 1
o+-~--------~~~,

Find
(a)
0

the value of k,
k
7. The continuous random variable Xb is uniformly
distributed in the interval a< x < . .
The lower quartile is 5 and the upper quarttle
is 9.
Find
"
Iallx

P(c .;; X.;; d)= rf(x)dx where a.;; c <d.;; b.

(b) P(2.1 <X <3.4), " Expectation E(X) =I xf(x)dx


(a) the values of a and b, allx
(c) E(X),
(d) Var(X).
(b) "6<X<n, .
(c) the cumulative distribution functton F(x).
oo Variance, Var(X) =I x 2 f(x)dx-E 2 (X)
4. The random variable X has p.d.f. f(x) as shown allx
8. X has cumulative distribution function F(x)
in the diagram.
" The cumulative distribution function F(x)

r
illustrated as follows
y
F0<l
F(t) = f(x )dx for a< t <b.
To obtain f(x) from F(x), differentiate F(x)
o+-~--------~~~, d
0 0.5 3 f(x) = dx F(x) = F'(x).
-2 3
If two independent observations of X are made,
find the probability that one is less than 1.5 and (a) Find the probability density function f(x). Median, quartiles and other percentiles
the other is greater than the mean. (b) Find the standard deviation of X. Medianm:
(c) Find the interquartile range.
F(m) =0.5
5. The random variable Y has probability density (d) Find the 20th percentile Lower quartile q 1: F(q 1 ) = 0.25
function given by
Upper quartile q 3 : F(q 3 ) = 0.75
f0.2 32<y<37
f(y) ~ )0 otherwise n
nth percentile F(nth percentile)=-
Find the probability that Y lies within one 100
standard deviation of the mean. Interquartile range = q 3 - q 1
® The continuous uniform (rectangular) distribution
1 f~)
If f(x)=-b- a.;;x.;;b,thenX-R(a,b)
-a I
f(x) = (b-a)
E(X) =!(a+ b)
Var(X) = fzeb- ajl
o+-~----4-~
0 x.o;;;a 0 a b x
x-a
F(x)= - - a .;;x.;; b
b-a

11 X;> 1
352 /~CONCiSE. COURSE: IN .6.--LE\f _L STr\T!Snr:.:s

Example 6.28
Miscellaneous worked examples
The length of blades of grass mown from a lawn are modelled by a uniform distribution
between 1 em and 5 em.
Example 6.27
(a) Find the standard deviation of this distribution.
The random variable X has probability density function
(b) Find the percentage of blades of grass whose lengths lie within one standard deviation of
3x• 0 <: x <: 1, the mean length.
f(x) = {0 otherwise, (c) A better model may be a triangular distribution as shown.
f(x)
where k is a positive integer.
c
Find
(a) the value of k,
(b) the mean of X,
(c) the value, x, such that P(X <: x) = 0.5.
•'
0
A''
1 5
''

Solution 6.27 Find the value of C. (NEAB)


(a) Since X is a random variable, J f(x)dx = 1
allx Solution 6.28
Therefore L
1
3xk dx = 1 (a) X is the length, in centimetres, of a blade of grass.
f(x) = 41
f(x) = !, 1 <: x <: 5. (see page 345)
3[(:::) J: 1 = Var(X) = rz(5 -1) 2 = # = 1 (see page 348)
3 Standard deviation of X=~= 1.15 (2 d.p.) 0
--=1
0 1 5 '
k+1 1+5 f(X)

k+1=3 (b) E(X)=--=3


2
k=2 '
4
P(3 - ~ <;X<; 3 + ~) = 2 ~X :\ '

3x 2 0 <: x <: 1
= 0.577 ... 0 '
f(x) = {0 otherwise
0 1 3 5
3- -{f 3 + -{f
(b) E(X) = [ xf(x )dx
So approximately 58% of blades of grass have length within one standard deviation of the
mean.
3
= J>x dx (c) Total area= 1 f(x)

Area of rectangle= 4 x ! = !
=~[xt
c
i
Area of triangle = x 4 x h = 2h

= 0.75 y
i + 2h = 1 •'
The mean of X is 0.75. h=~
11 1 3
0' 4o-+-l-----51---,

r
(c) Let P(X <: x 1) = 0.5 C = h +s=4+s=s

Therefore 3x 2 dx = 0.5

[x 3 ]~' = 0.5
x,'=0.5
X 1 = (0.5)l
= 0.794 (3 d.p.)
Sox- 0.794 (3 d.p.)

_j
Example 6.29
Miscellaneous exercise 6g
On any day, the amount of time, measured in hours, that Mr Goggle spends watching 1. A continuous random variable X has a 5. The continuous random variable X has
television is a continuous random variable T, with cumulative distribution function g1ven by probability density function, f, defined by probability density function f(x) defined by

f(x)~jx O<x<2, e
(x < -1)
t ~ 0
~ ~~- k(15- t)
x4

!
f(x) = 0 otherwise.
f(x)~ e~2-x )
2 2
F(t) 0 q,;, 15 (-1 <x~ 1)
Find the expected value of
t;;, 15 (x > 1)
(a) X, x4
(b) 2X+4 (NEAB)
where k is a constant. (a) Show that c = l·
(b) Sketch the graph of f(x).
(a) Show that k ~ 2h and find P(5,;, T,;, 10). 2. (a) A continuous variable X is distributed at
(c) Determine the cumulative distribution
(h) Show that, for 0,;, t,;, 15, the probability density function ofT is given by random between the values 2 and 3 and has
6 function F(x).
a probability density function of 2 . (d) Determine the expected value of X and the
f(t) ~ fs- zis t. X variance of X. (C)
(C) Find the median value of X.
(c) Find the median ofT. 6. A continuous variable X is distributed at random
(b) A continuous random variable X takes
values between 0 and 1, with a probability between the values x = 0 and x = 2, and has a
density function of Ax(1- x) 3• Find the probability density function of ax 1 + bx. The
Solution 6.29 value of A, and the mean and standard mean is 1.25.
(a) When t ~ 0, F(t) ~ 0 deviation of X. (a) Show that b =~'and find the value of a.
(b) Find the variance of X.
Using F(t) ~ 1- k(15- t) 2 , 3. A continuous random variable X has probability (c) Verify that the median value of X is
when t~ 0 density function f(x) given by f(x) = 0 for x < 0 approximately 1.3.
0 ~ 1- k X 15 2 and x > 3 and between x = 0 and x = 3 its form is (d) Find the mode.
as shown in the graph.
7. The continuous random variable X has
f(X)
P(5,;, y,;, 10) ~ F(10)- F(5) probability density function given by
~ 1- 2) 5 (15 -10)"- (1- 1, (15- 5) 2) ex 2 0 .;;;;x.;;;; 2

l
2

~ 1- A's - (1 -l~~) f(x)~ ~e(4-x) 2 .;;;;x.;;;; 4


-,
_1 otherwise

(b) Fort,;, 0 or t;;, 15,f(t) ~ 0. where cis a constant.


0 2 3
For 0,;, t,;, 15, f(t) ~ F'(t) (a) Show that c ~ 0.15.
(a) Find the value of A. (b) Find the mean of X.
f(t) ~-zis (15- t) X ( -1)
(b) Express f(x) algebraically and obtain the
mean and variance of X.
(c)
(d)
Find the lower quartile of X.
Find the probability that a single
~ zis (15- t) (c) Find the median value of X. observation of X lies between the lower
2 2 A sample X 1, X 2 and X 3 is obtained. What is the quartile and the mean.
=15-225t
probability that at least one is greater than the (e) Three'independeht observations of X are
(c) Let the median ofT be m so F(m) ~ 0.5 median value? taken. Find the probability that one of the
observations is greater than the mean and
1- 2) 5 (15 -m) 2 ~ 0.5 4. The number of kilograms of metal extracted the other rw-o are less than the median value
from 10 kg of ore from a certain mine is a of X. (C)
(15- m) ~ 112.52
continuous random variable X with probability
15-m~ ±'1/112.5 density function f(x), where f(x) = cx(2- x) 2 if 8. The total number of radio taxi calls received at a
0.;;;; x.;;;; 2 and f(x) = 0 otherwise, where cis a control centre in a month is modelled by a
m ~ 15- '1/112.5 or m ~ 15 + '1/112.5 constant. random variable X (in tens of thousands of calls)
Show that c=0.75, and find the mean and having the probability density function
Since f(t) is valid only for 0 < t,;, 15, variance of X.

l
ex, 0 <x< 1
The cost of extracting the metal from 10 kg of
m ~ 15- '1/112.5 ore is £10x. Find the expected cost of extracting f(x)~ c(2-x) 1 .;;;;x< 2
~4.393... the metal from 10 kg of ore. (MEl) 0, otherwise

Median~ 4.4 (2 s.f.)


..............--------------------
12. A horticultural firm is studying the number of 14. An ironmonger is supplied with paraffin once a
f(x)
10. The times, in excess of two hours, taken to hours that daffodils will last in a vase of water week. The weekly demand, X hundred litres, has
complete a marathon road race are modelled by
with a new additive. The random variable X (in the probability density function f, where
the continuous random variable T hours, where
hundreds of hours) with probability density
c
T has the probability density function
function
f(x) ~ c(1- x) 7 0 "X" 1
4 2 2
f(x) ~ 0 otherwise
f(t) ~ 27 t (3 - t) O"t"3 f(x)~~k(4-x ) 0 <x<2
0 otherwise where cis a constant. Find the value of c.
f(t) ~ 0 otherwise Find the mean value of X, and, to the nearest
is proposed as a model. litre, the minimum capacity of his paraffin tank
o-IL--~--+"'0 The diagram shows a sketch of the probability if the probability that it will be exhausted in a
0 2 X (a) Show that the value of the constant k is k
density function. given week is not to exceed 0.02. (L)
(b) Find the mean number of hours that a
(a) Show that the value of cis 1.. daffodil will last, according to this model.
f(t) 15. The probability that a randomly chosen flight
(b) Write down the probability that X< 1. (c) Use this model to find the probability that a
from Stanston Airport is delayed by more than
(c) Show that the cumulative distribution daffodil will last for more than 100 hours.
x hours is 160 (x-10) 2, for x E R, 0" x < 10. No
function of X is The new additive is tested on carnations and it is flights leave early, and none is delayed for more
Q X< 0 found that several of these last for more than than ten hours. The delay, in hours, for a
250 hours. randomly chosen flight is denoted by X.
lx 2 O<x<1
F(x)= ; x-ix 2 -1 l.;;;;x< 2 (d) Explain why the random variable X, with (a) Find the median, m, of X, correct to three
0 3 probability density function f(x) as defined significant figures.
1 x)'2 above, would not be a suitable model in this (b) Find the cumulative distribution function, F,
\ (a) Find the mean and variance of the times
{d) Find the probability that the control centre taken to complete the race. case. of X and sketch the graph of F.
receives between 8000 and 12 000 calls in a (b) Find the modal time taken to complete the {e) Suggest how the probability density function (c) Find the probability density function,(, of X
race.
could be changed to model the time and sketch the graph of f.
month.
A colleague criticises the model on the grounds (c) What proportion of competitors complete carnations will last. (L) (d) Show that E(X) ~ l,".
that the number of radio calls must be discrete, the race in less than the modal time? A random sample of two flights is taken. Find
while the model used for X is continuous. (d) Show that the median time to complete the 13. Each batch of a chemical used in drug the probability that both flights are delayed by
race lies between the mean and the mode. manufacture is tested for impurities. The
(e) State briefly whether you consider that it more than m hours, where m is the median of X.
was reasonable to use this model for X. (MEI) percentage of impurity is X, where X is a
(C)
(f) Give two reasons why the probability random variable with probability density
function given by
density function in the diagram might be 11. A tennis player hits a ball against a wall, aiming 16. The continuous random variable X has
unsuitable as a model. at a fixed horizontal line on the walL The kx 0 <x< 1 probability density function given by
(g) Sketch the shape of a more suitable vertical distance from the horizontal line to the
probability density function. (L) point where the ball strikes the ~all is recorde.d f(x)~ ~k(4-x) 1<x"4 kx O"x"1,

9. The lifetime, in tens of hours, of a certain delicate


as positive for points above the lme and negative
for points below the line.
l otherwise f(x) ~ ~x
2
1"X" 2,
electrical component is modelled by the random It is assumed that the distribution of this vertical
where k is a constant.
(a) Sketch the graph of f(x).
l otherwise
variable X with probability density function distance, X metres, may be modelled by the (a) Show that k = f,.
probability density function (b) Show that k ~ ). (b) Find the cumulative distribution function
k(9- x) O"x"9 (c) Determine, for all x, the distribution
f(x) ~
l 0
where k is a positive constant.
otherwise
f(x)""' 0
\
1.5(1- 4x 2) -0.5 <x < 0.5
otherwise
function F(x).
In order to purify the chemical it is subjected to
one of four possible purification processes, the
of X.
(c) Find, correct to two decimal places, the
median, m, of X.
{d) Find, correct to tWo decimal places,
2 (a) State the probability that the ball strikes the
percentage impurity in the batch determining the P(l X- m I< 0.75). (C)
(a) Show that k ~ . wall precisely 0.25 m above the line.
81 (b) Determine the probability that the ball
actual process used. The process used and its
(b) Find the mean lifetime of a component. cost, for each level of percentage impurity, is 17. Determine A such that
strikes the wall more than 0.25 m above the
(c) Show that the standard deviation of shown in the table.
line. 0 x<O
lifetimes is 21.2 hours. (c) (i) Give a reason why the mean value of
(d) Find the probability that a component lasts Percentage Process Batch A/2 O"x<1
X is zero.
at most 50 hours. (ii) Calculate the variance of X. Impurity x used cost(£) 0 1 <x< 2
f(x)~
A particular device requires. two bf these (d) Give one reason why the above probability 3.l 3.l(x- 3)2
components and it will not operate if one or model may not be appropriate. . 0<x<1 A 200 2<x"4
2 4
more of the components fail. The device has just (e) Suggest one likely effect of repeated practice 1<x<2 B 250
0 x>4
been fitted with two new components. The on the above probability model. (NEAB) 2<x(3 c 350
lifetimes of components are independent. is a probability density function of the
3<x<4 D 500
(e) Find the probability that the device will distribution of a random variable X.
work for more than 50 hours. (d) Determine the expected cost per batch of Sketch the density function and find E(X) and
(f) Give a reason why the above distribution removing the impurities. P(X <3.5). (MEI)
may not be realistic as a model for the (e) Determine the probability that the cost of
distribution of lifetimes of these electrical purifying a batch exceeds the expected cost.
components. {L) (NEAB)
H.Lf:_li rr·

18. A random variable X has cumulative 19. (a) A discrete random variable R takes integer Mixed test 6B
values between 0 and 4 inclusive with
(distribution) function F(x) where
probabilities given by 1. The continuous random variable X has 3. A firm has a large number of employees. The
0 X< -1 probability density function given by distance in miles they have to travel each day
r+ 1
ax+a -1 <x< 0 (r ~ 0, 1, 2) from home to work can be modelled by a
F(x)~
2ax+a 0 <;;;;x< 1 P(R ~ r) ~
10 f(x)~('otx' 0<;x~2 continuous random variable X whose cumulative
9-2r otherwise. distribution function is given by
1~x (r~3,4)
\ 3a Sketch the graph of f.
\ 10 (a) F(l) ~ 0
Determine (b) Calculate the mean of X.
Find the expectation and variance of R.
(a) the value of a, (c) Calculate the standard deviation of X. F(xH(1-M 1 <;x~b
(b) A continuous random variable X takes
{d) Show why the median value of X must be
(b) the frequency function f(x) of X, values in the interval x > 0. The probability
(c) the expected value ,a of X, greater than the mean. (NEAB) F(b) ~ 1
density function of X is defined by
(d) the standard deviation a of X,
2. The random variable X has probability density where b represents the farthest distance anyone
~ {k:
(e) the probability that l X -Jl l exceeds~· (C) if 0 <;;;;X<;;;; 1
lives from work.
function
f(x) The diagram shows a sketch of this cumulative
if X> 1
x' f(x) ~ ~~(x- x') O<;x~l distribution function.
otherwise
Prove that k = ~ and find the expectation F(x)

and variance of X. (C) (a) Show that a~ 4.


(b) Find E(X)
(c) Find the mode of the distribution of X. (L)

Mixed test 6A 0.5

1. A survey of 491 households, in part of the 2. The random variable X has a probability density
Midlands, gave the following results for gross function given by o+-4-------~--+
weekly income, £y. 0 <;;;;x< 1 0 b

elsewhere A survey suggests that b = 5. Use this parameter


Income (y) No. of households for parts (a) to (d).
k being a constant. Find the value of k and find
0 <y< 80 68 also the mean and variance of this distribution. (a) Show that k ~ 1.25.
38 Find the median of the distribution. (0 & C) (b) Write down and solve an equation to find
80 ~y< 130
the median distance travelled to work.
130 ~y < 170 46
3. The amount of vegetables eaten by a family in a (c) Find the probability that an employee lives
170~y<220 40 week is a random variable W kg. The probability within half a mile of the median.
220<;y<270 50 density function is given by (d) Derive the probability density function for X
45 and illustrate it with a sketch.
270~y<320
20 (e) Show that, for any value of b greater than 1,
320<:y<400 60 w 3 (5- w)
f(w)~ ss the median distance travelled does not
400~y<800 144 {0 exceed 2. (MEI)
otherwise

(a) Draw a histogram on graph paper to (a) Find the cumulative distribution function
illustrate these data. Label your scales and of W.
axes clearly. (b) Find, to three decimal places, the probability
that the family eats between 2 kg and 4 kg
A statistician suggests that a suitable model for of vegetables in one week.
the gross weekly income in £100 units is the (c) Given that the mean of the distribution is
continuous random variable X with probability 3l, find, to three decimal places, the
density function
variance of W.
3k O~x<4 (d) Find the mode of the distribution.
(e) Verify that the amount, m, of vegetables
f(x)~k 4~x<:8
such that the family is equally likely to eat
{0 otherwise more or less than m in any week is about
where k is a constant. 3.431 kg.
(f) Use the information above to comment on
(b) Find the value of k.
the skewness of the distribution. (L)
(c) Use this model to estimate how many of
these 491 households have a gross weekly
income in the range £0-£130.
(d) Comment on your findings. (L)
Notice also that
• approximately 95% of the " approximately 99.9% (very nearly all)
distribution lies within two standard of the distribution lies within three
deviations of the mean standard deviations of the mean

The normal distribution ,, p + 2a jl-30 I' p + 3rr

The spread of the distribution depends on a. Here are some normal curves, each drawn to the
same scale:
In this chapter you will/earn how to
X- N(O, 1) X- N(4,!) X -N(50,4)
@ standardise a normal variable and use standard normal tables f(x)+ f(x)t I
I
@ use the normal distribution as a model to solve problems I ' I
i i

fi·~
I
® use the normal distribution as an approximation to the binomial distribution and to the l

~·•~~'"rr
Poisson distribution
---r= - c+- ·~
-3 -2 -1 0
0
2 3 ' 2 3 4 5 6 ' 44 45 46 47 48 49 50 51 52 53 54 55 56 '
jl = p=4 )i =50
The normal distribution is one of the most important distributions in statistics. Many a= 1 a= '
2 rr=2
measured quantities in the natural sciences follow a normal distribution and under certai~
circumstances it is also a useful approximation to the binomial distribution and to the Pmsson
distribution. FINDING PROBABILITIES
The normal variable X is continuous. Its probability density function f(x) depends on its
The probability that X lies between a and b is written
. . 1 -(X-!t?/2a 2
mean fJ- and standard dev1at10n a, where f(x) = ~~ e , -<XI< x < 00 • P(a <X< b). To find this probability, you need to find the
a~2n area under the normal curve between a and b.
This is very complicated and has been included just for reference. You would not be expected One way of finding areas is to integrate, but since the
to remember it! normal function is complicated and very difficult to
integrate, tables are used instead. a b
To describe the distribution, write

THE STANDARD NORMAL VARIABLE, Z


In order to use the same set of tables for all possible values off' and a 2 , the variable X is
Notice that the description gives the variance a 2 , rather than the standard deviation, a. standardised so tbat the mean is 0 and the standard deviation is 1. Notice that since the
The normal distribution curve has the following features: 1 1
variance is the square of the standard deviation, the variance is also 1. This standardised
<:---- normal variable is called Z and Z- N(O, 1).
1 af2.n
e It is bell-shaped
I To illustrate how the variable X is standardised to tbe variable Z, consider X distributed
" It is symmetrical about 11 I
normally with a mean 50 and a variance 4,
e It extends from -oo to +oo I
I
1 I
i.e. X- N(50, 4 ). f' ~50 and a 2 ~ 4, so a~ 2.
e The maximum value of f(x) is . r::;--- I
a~2n
I
" The total area under the curve is 1 I
,,
; ~

,~. . 0·2 and the curve is shown in the right-hand section of


'i
__
1 (_;

·
Th e maximum va 1ue o f f(x) 1·s -{2;; '
I ; Ci ; ADD
2 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 O.S359 4 8 12 16 20 24 28 32 36
the diagram below. 1,1 ,, ' 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 10.56361 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
Now translate tbe curve 50 units to the left so that the mean is 0. This is shown on the left (b) 0.6179 0.6217
hand section of the diagram. The standard deviation a is still 2, so the max1mum value 1s
'),_ 0.6255 0.6293 10.63311 0.6368 0.6404 0.6443 0.6480 0.6517 4 7 11 15 ll2l 22 26 30 34
(c) 0.6554 0.6591 10.66281 0.6664 0.6700 0.6736 0.6772 0.6808 0.6884 0.6879 4 7 11 14 18 22 25 291m
again approximately 0.2. 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31

0.4
f(x)
"''
0.4
X~ N(50, 41 "·"
0.7257
0.7580
0.7291
0.7611
0.7324
0.7642
0.7357
0.7673
0.7389
0.7704
0.7422
0.7734
0.7454
0.7764
0.7486
0.7794
0.7517
0.7823
0.7549
0.7852
3 7 10 13 16 19 23
3 6 9 12 15 18 21
26
24
29
27
,·I.~ 0.7881
X- 50 - NIO, 41 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
Translate 50 units i) '/ 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
(a) To find P(Z < 0.16), read off the value of <1>(0.16):

~ ~ ~ 0 2 4 6 M ~ q ~ ~ M M
" find row 0.1 and go across to column 6. This gives 0.5636.
P(Z < 0.16) ~ 0.5636 Q>(z) = 0.5636
Now 'squash' the curve towards the vertical axis so that the standard deviation is 1. This is
done by dividing by the standard deviation (a~ 2). (b) To find P(Z < 0.345), read off the value of <1>(0.345):

I
e Find the value when z ~ 0.34 from row 0.3, column 4.
f(Z) 00.16
This is 0.6331.
ot;-, I

X- 50 ~ N(O 1) • Now go to the right-hand section and read the number along that row in column 5.
--7,:\
'squash' in /0.2 1 \
~
'squash' in
2 '
X-50 •
This is 19.
Note the instruction to ADD. This means that 19 is added to the digits 6331.
Youwrite Z=---
/
~~.,,-~.,·
: \ "'·"- so that Z ~
2
N(O, 1)
6331
+ 19 P(Z < 0.345) ~ 0.6350
-3 -2 -1 0 2 3 6350
In general
(c) To find P(Z < 0.429), read off <1>(0.429): 6228
fo " Find row 0.4, column 2, right-hand section 9. + 32
•,ubtract the mea.n p P(Z < 0.429) ~ 0.6660 6660
then divide the deviation o

to obtain

vvhcn~ l)
Example 7.1
Using the standard normal tables printed on page 649, find
(a) P(Z < 0.85) (b) P(Z > 0.85)
USING STANDARD NORMAL TABLES
Solution 7.1
The standard normal tables give the area under the curve as far as
a particular value z. This is written <l>(z). (a) (b)

This area gives the probability that Z is less than a particular 1 - ¢(0.85)
value z, so P(Z < z) ~ <l>(z).
Note that <1> is a Greek letter, pronounced phi. 0 z

The tables are printed on page 649. On the foJlowing page is an extract from the first section. 0 0.85 0 0.85
The highlighted values are referred to in the following text. Notice that the values of <l>(z) are P(Z < 0.85) ~ <1>(0.85) P(Z > 0.85) ~ 1- <1>(0.85)
given to four decimal places in the tables. ~ 0.8023 ~ 1-0.8023
~ 0.1977
364 f\ C:Of-,iC!S\ C~ClJFZSI_- !1-l t. i._ (~\/':_1._ ~:.T.L !iS 1';\-~c;

(b) (i) Using P(Z >-a) = P(Z <a) =<!>(a)

A
In general P(Z > -1.377) = <1>(1.377)
> a) 1 <ll(a) = 0.9158
= 0.92 (2 s.f.)

-1.377 0
0
' (ii) I Using P(Z >a) = 1 _<I>( a)
Finding probabilities involving negative values of z ~\ P(Z > 1.377) = 1- <1>(1.377)
= 1-0.9158
The standard normal tables start at z = 0. You can however find probabilities relating to = 0.0842
negative values of z by using the symmetrical properties of the curve. Look at these diagrams: = 0.084 (2 s.f.)
0 1.377
To find P(Z <-a), where a> 0 (iii) Using P(Z <-a) = P(Z >a) = 1 _<I>( a)
P(Z < -1.377) = 1- <1>(1.37 7 )
= 1- 0.9158
P(Z <-a) =<I>( -a) = 0.0842
= 1 -<!>(a)
= 0.084 (2 s.f.).
-1.377 0

0 '
.
''~''i~······~ ~~·--~~··-~··~···~ .
mportant results- these are worth learning.
To find P(Z >-a), where a> 0
In the following, a> 0, b > 0 and a< b.
Examples:
<!>(a)
P(Z >-a)= <I>( a)
(a) (a) P(0.345 < Z <1. 751)
= <1>(1.751)- <1>(0.345)
= 0.9600- 0.6350
-a 0 0 a
= 0.3250
From the diagrams, it is obvious that
(()
0 ' b

<a) ,. '/ < b!


(b) (b) P(-2.696 < Z < 1.865)
Example 7.2
= <1>(1.865)- <1>(-2.696)
Z -N(O, 1) = <1>(1.865)- (1 ~ <1>(2.696))
(a) Using the standard normal tables on page 649, find P(Z < 1.377), = <1>(1.865) + <1>(2.696)- 1
(b) Drawing sketches to illustrate your answers, find = 0.9690 + 0.9965 - 1
(i) P(Z > -1.377) = 0.9655
(ii) P(Z > 1.377) -a 0 b
In practice, you can just write
(iii) P(Z<-1.377) P(-a < Z <b)= <I>( b)_ <1>(-a)
(Give your answers correct to two significant figures.) P(-2.696 < Z < 1.865)
=<I>( b)- (1- <!>(a))
= <1>(2.696) + <1>(1.865)- 1
=<I>( b)- 1 +<!>(a)
= <l>(a) +<I>( b)- 1 and go on from there.
Solution 7.2
P(Z < 1.377) = <1>(1.377) _/'( a < (Ji
(a)
= 0.9158 -2.696 0 1.865
= 0.92 (2 s.f.)

0 1.377
\
Solution 7.3
(c) (c) P(-1.4 < Z < -0.6) 1•.

:~95%
I
~ <1>(1.4)- <!>(0.6) I (a) P(-1.96 < Z < 1.96) ~ 2<!>(1.96) -1
I
~ 0.9192- 0.7257 ~ 2(0.975)- 1 I \
I
I \
~ 0.1935 I ~ 0.95 2.5% 2.5%
I
I
I
I .. P(-1.96 < Z <1.96) ~ 0.95. -1.96 0 1.96

(b) P(-2.575 < Z < 2.575) ~ 2<1>(2.575)- 1

Ii"i ~99%
P(-b < Z <-a)~ <1>(-a)- <1>(-b) ~ 2(0.995) - 1
~ 1- <!>(a)- (1- <I>( b)) ~ 0.99

r:: -b < z-<


~<I>( b)- <I>( a)
.. P(-2.575 < Z < 2.575) ~ 0.99.
The cuHraJ 99'}';) of the distribution lies bctwCJiJJ :i:2oS7S.,
0.5%

""
J
-2.575
i \ •<L
I
0 2.575
0.5%

(d)
(d) P( Iz I< 1.433) NOTE: These are important results which will be used in later work.
~ P(-1.433 < Z < 1.433)
~ 2<1>(1.433)- 1
~ 2 X 0.9240- 1
~ 0.8480 p II Z~IV(Cl, 1)
0
Draw sketches to illustrate your answer and consider whether your answer is sensible.
-a 0 a I. If Z- N(O, 1), find 5. If Z- N(O, 1), find
P( 1 z 1 <a)~ P(-a < Z <a) (a) P(Z < 0.874), (b) P(Z > -0.874), (a) P(0.829 < Z < 1.834),
~<!>(a)+ <!>(a)- 1 Result (b) (c) P(Z > 0.874), (d) P(Z < -0.874). (b) P(-2.56 < Z < 0.134),
~ 2<l>(a) - 1
(c) P(-1.762<Z<-0.246),
2. If Z- N(O, 1), find (d) P(O<Z<1.73),
I'( I Z I < a! ~ J (a) P(Z > 1.8), (e) P(-2.05 < Z < 0),
(b) P(Z < -0.65), (f) P(-2.08<2<2.08),
(c) P(Z > -2.46), (d) P(Z < 1.36), (g) P(1.764<Z<2.567),
P( I Z I> 1.433

J :i \
(e) (e) P(Z > 2.58), (f) P(Z > -2.37),
(e) (h) P( -1.65 < Z <1. 725),
~ P(Z < -1.433) + P(Z > 1.433) (g) P(Z < 1.86), (h) P(Z < -0. 725), (i) P( -0.98 < Z < -0.16),
(i) P(Z > 1.863), (j) P(Z < 1.63),
~ 2(1- <!>(1.433)) /~ (k) P(Z > -2.061), (j) P(Z < -1.97 or Z > 2.5),
(I) P(Z < -2.875).
(k) P( I z I< 1.78),
~ 2(1- 0.9240) (I) P(IZI>0.754),

~ 0.1520 ~-~~-
3. If Z- N(O, 1), find
(m) P(-1.645 < Z <1.645),
(a) P(Z > 1.645), (b) P(Z < -1.645), (n) P( I Z I> 2.326).

P( 1
-a 0

z 1>a)~ P(Z <-a)+ P(Z >a)


a
1--b (c) P(Z > 1.282),
(e) P(Z > 2.575),
(g) P(Z > 2.808),
(d) P(Z > 1.96),
(f) P(Z > 2.326),
(h) P(Z <1.96).
6. Z- N(O, 1)

~ 1 -<I>( a) + 1 -<I>( a) Complete this statement:


4. Z-N(O, 1)
~ 2(1- <!>(a)) Find the probabilities represented by the shaded The central ... % of the
areas in the diagrams. distribution lies between
1 z \>a:: 2(1
(a) (b)
~~-:-2~+::~~ ±0.674.

It is also useful to remember tbat 7. Z- N(O, 1) and P(Z <a)~ 0.3,


P(a < Z <b)~ 0.6.
J'! [Zi>a> P( Zi <a)
j Find
~~ ~ ~~ -- (a) P(Z<h),
0 1.5 2 ·1 0 2 (b) P(Z>a).
Example 7.3 (c) 8. Z- N(O, 1) and P(Z <a)~ 0.7, P(Z >b)~ 0.45.
Find
z- N(O, 1). Show that (a) <P(b),
(a) P(-1.96 < z < 1.96) ~ 0.95 (b) P(b<Z<a).
(b) P(-2.575 < z < 2.575) ~ 0.99 9. Z-N(0,1)andP(IZI<a)~0.8.
-1.5 0 1.5 Find
(a) P(Z <a),
(b) P(Z>a).

i i
THE i\OFir/1\L Di(lTR:DUTiCf'l 369

(b) To find the probability that the length is within 5 em of the mean you need to find
USING STANDARD NORMAL TABLES FOR ANY NORMAL VARIABLE X P( I X- 150 I <5). '

Remember that to standardise X, where X~ N(f-', a


2
), .. d.mg b y the stan d ar d d ev1at10n
D 1v1 . . gives
. P (IX- 1501 < s ) i.e. P( I z I< 0.5)
10 10
'" subtract the meau I'
,. then divide by the standard deviation a P( I z I< 0.5) ~ 2<1>(0.5) - 1 '""'' (d) P"P' .l66
~ 2 X 0.6915- 1

to give
x -r<
z~-­ where Z ~ N(O, 1)
~ 0.383

a ~ 0.38 (2 s.f.)

The procedure is illustrated in the following example: The probability that the length is within 5 em of the mean is
0.38. X: 145 150 165
Z: -0.5 0 0.5
Example 7.4
Lengths of metal strips produced by a machine are normally distributed with mean length of
150 em and a standard deviation of 10 em. Example 7.5
Find the probability that the length of a randomly selected strip is The time taken by the millcman to deliver to the High Street is normally distributed with a
mean of 12 minutes and a standard deviation of 2 minutes. He delivers milk every day.
(a) shorter than 165 em,
Estimate the number of days during the year when he takes
(b) within 5 em of the mean.
(a) longer than 17 minutes,
(b) less than ten minutes,
Solution 7.4 (c) between nine and 13 minutes.
X is the length, in centimetres, of a metal strip.
Since f' ~ 150 and a~ 10, X~ N(150, 10
2
)
Solution 7.5
(a) You need to find the probability that the length is shorter X is the time, in minutes, taken to deliver milk to the High Street.
that 165 em, i.e. P(X < 165). X~ N(12, 2 2 )
To be able to use the standard normal tables, standardise . . X-rt X-12
StandardiSe X usmg Z ~-- , i.e. Z ~ - - - .
the X variable by subtracting the mean, 150, then dividing a 2
by the standard deviation, 10. Apply this to both sides of
17)~r(z >
17 12
the inequality X< 165. (a) P(X > ; )
X-150
X becomes ~ Z, ~ P(Z > 2.5)
10 ~ 1 - <1>(2.5)
165-150 ~ 1- 0.9938
165 becomes ~ 1.5,
10 ~ 0.0062

so P(X < 165) becomes P(Z < 1.5) 12 17


To find the number of days, multiply by 365. 0 2.5
0 1.5
P(X < 165) ~ P(Z < 1.5) 365 X 0.0062 ~ 2.263 ~ 2
~ <1>(1.5)
On two days in the year he takes longer than 17 minutes.
~ 0.9332
~ 0.93 (2 s.f.)
(b) P(X < lO)~P(z < 10 ; 12
)
The probability that the length is shorter than 165 em is 0.93.
~P(Z<-1)
NOTE: Although the X and Z distributions have different
150 165 ~ 1- <1>(1)
spreads, in practice it is convenient to show the values for 0 1.5 ~ 1-0.8413
both distributions on one sketch.
~ 0.1587
10 12
Now 365 x 0.1587 ~ 57.92 ~58 -1 0

On 58 days in the year he takes less than ten minutes.


370 ,c,

9. A random variable X is such that X- N(-5, 9). sizes of marrow:


9- 12 13- 12) (a) Find the probability that a randomly chosen Size 1, under 0.9 kg,
(c) P(9<X<13)=P--<Z< 2 item from the population will have a
( 2 Size 2, from 0.9 kg to 2.4 kg,
positive value. Size 3, over 2.4 kg.
= P(-1.5 < Z < 0.5} (b) Find the probability that out of ten items
Find, three decimal places, the proportions of
= <!>(0,5} + <!>(1.5}- 1 chosen at random, exactly four will have a
marrows in the three sizes.
positive value.
= 0.6915 + 0.9332- 1 The prices of the marrows are 16p for Size 1,
= 0.6247 X: 9 40p for Size 2 and 60p for Size 3. Calculate the
z, -L5 10, X- N(100, 81), Find
(a) P( IX -100 I <18), expected total cost of 100 marrows chosen at
Now 365 x 0,6247 = 228.01 ~ 228 random from those supplied. (L)
(b) P( I X -100 I> 5),
On 228 days in the year he takes between nine and 13 minutes. (c) P(12 <X -100 < 15),
13. The random variable Y is such that Y- N(8, 25).
NOTE: Since X is a continuous variable, the following are indistinguishable: 11. The life of a certain make of electric light bulb is Show that, correct to three decimal places,
known to be normally distributed with a mean P( I y- 81 < 6.2) ~ 0, 785.
9 <X< 13, life of 2000 hours and a standard deviation of Three random observations of Yare made. Find
120 hours. Estimate the probability that the life the probability that exactly two observations will
9 <:;X< 13, lie in the interval defined by I Y- 81 < 6.2. (C)
9 <X<:; 13, of such a bulb will be
(a) greater than 2150 hours,
9 <:;X<:; 13. (b) greater than 1910 hours, 14. The manufacturers of a new model of car state
(c) within the range 1850 hours to 2090 hours. that, when travelling at 56 miles per hour, the
petrol consumption has a mean value of 32.4
(C)
miles per gallon with standard deviation 1.4
se 7b bil USI X.~· N u 2) miles per gallon.
12. The weights of vegetable marrows supplied to
4. The random variable X is distributed normally retailers by a wholesaler have a normal Assuming a normal distribution, calculate the
1. The masses of packages from a particular probability that a randomly chosen car of that
machine are normally distributed with a mean of such that X- N(50, 20), Find distribution with mean 1.5 kg and standard
(a) P(X > 603), deviation 0.6 kg. The wholesaler supplies three model will have a petrol consumption greater
200 g and a standard deviation of 2 g. Find the than 30 miles per gallon when travelling at 56
probability that a randomly selected package (b) P(X < 59,8),
miles per hour. (C)
from the machine weighs
5. X- N(-8, 12). Find
(a) less than 197 g,
(a) P(X < -9,8),
(b) more than 200.5 g,
(b) P(X> -8,2),
(c) between 198,5 g and 199,5 g, USING THE STANDARD NORMAL TABLES IN REVERSE TO FIND z
(c) P(-7 <X <0.5).
2. The heights of boys at a particular age follow a
normal distribution with mean 150.3 em and 6. The masses of a certain type of cabbage are WHEN <l>(z) IS KNOWN
variance 25 em. normally distributed with a mean of 1000 g and
Find the probability that a boy picked at random a standard deviation of 0.15 kg. The procedure is illustrated below using the extract taken from the standard normal tables.
from this age group has height In a batch of 800 cabbages, estimate how many
have a mass between 750 g and 1290 g.
The highlighted values are referred to in the examples.
(a) less than 153 em,
(b) more than 158 em, ;
(c) between 150 em and 158 em, 7. The number of hours of life of a torch battery is ,, I )

'
I
I ;
~ ') ADD
(d) more than 10 em difference from the mean normally distributed with a mean of 150 hou~s
height. and standard deviation of 12 hours. In a quahty ("} u 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 10,94061 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
control test two batteries are chosen at random 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2' 3 4 5 6 7 8 9
3, X- N(300, 25) from a bat~h. If both batteries have a life less '·' 0.9554 0.9564 10,95731 0.9582 0.9591 0.9599
Find the probabilities represented by the shaded than 120 hours the batch is rejected.
(b)
1.. 11 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678
0.9608
0.9686
0.9616
0.9693
0.9625
0.9699
0.9633
0.9706
1
1
2
1
3
2
4 4 5 0 7 8
3 4 4 5 6 6
areas in the diagrams: Find the probability that the batch is rejected. l ( 0.9713
(a) 1:\ 8. Cartons of milk from a particular supermarket Ll.
0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756

0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
0.9761 0.9767 1

0
1

1
2

1
2

2
3

2
4

3
4

3 4
5 5

)\h rnrnrn
are advertised as containing 1litre, but in fact (c) 2 ; 0.9821 0.9826 10,98301 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 3 3 4
the volume of the contents is normally 2,"· 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
distributed with a mean of 1012 ml and a
standard deviation of 5 ml. (a} If you are given that <!>(z} = 0.9406,
300 308
(a) Find the probability that a randomly chosen to find z, look for 0.9406 in the main body of the table.
(b) carton contains more than 1010 ml. This occurs when z = L56,
(b) In a batch of 1000 cartons, estimate the
number of cartons that contain less than the so if <!>(z} = 0.9406, then z = 1.56,
advertised volume of mille. Using notation similar to the used in trigonometry where,
for example, if sin e = 0, 82, then e = sin - 1 0. 82, you could
write 0 '
(c)
q,-'(0,9406} = 1.56
This means that the value of z such that <!>(z} = 0.9046 is 1.56.

290 300 310


(c) P(Z >a)~ 0.7367 I P(Z >a) = 0.7367
(b) Find z if P(Z < z) ~ 0.9579 Since the probability is greater than 0.5, a must be
so
<l>(z) ~ 0.9579
z ~ <1>- 1 (0.9579)
Look for 0.9579 in the main body of the table. It does not appear, so look for the number
below it. This is 0.9573 and it occurs when z~ 1.72.
To get the digits 9579 you need to add 6 to 9573. Look at the far right-hand section and
ne~at1ve, and therefore -a is positive.
Usmg symmetry, <1>(-a) ~ 0.7367
-a~ <1>- 1 (0.7367)
~ 0.633 '
:\_
0
a~ -0.633
find 6. It is in column 7. This means that the z value required is 1.727. I

So z ~ <1>- 1 (0.9579) ~ 1.727.


<1>{-a)

(c) Find z if P(Z < z) ~ 0.9832

so
<l>(z) ~ 0.9832
z ~ <1>- 1 (0.9832)
iI
0 _,
(d) P(Z <a)~ 0.0793
Look for 0.9832 in the main body of the table; note that ¢(2.12) ~ 0.9830.
Since the probability is less than 0.5, a must be
Refer to the end column and you find that negative.
<1>(2.124) ~ 0.9832 Using symmetry,
<1>(2.125) ~ 0.9832 <I>( -a) ~
1 - 0.0793
<1>(2.126) ~ 0.9832 ~
0.9207
The probabilities have been given to four decimal places and it is not possible to -a~ <1>- 1(0.9207)
distinguish between the z values, so just decide on one of them, say 2.124. ~1.41
So z ~ <1>- 1 (0.9832) ~ 2.124. a~-1.41

0.0793
NOTE: If you cannot find the value for the probability in the table, choose the value that
is closest to the required probability. 0 ..,
Often final answers are given to two or three significant figures, so these discrepancies are
not important. Example 7.7
If Z- N(O, 1) find a such that P( 1 z 1 <a)~ 0.9.
Example 7.6
Solution 7.7
If Z- N(O, 1), find the value of a if
P( IZ I <a) ~ 0.9,
(a) P(Z <a)~ 0.9693 i.e. P(-a < Z <a)~ 0.9.
(b) P(Z >a)~ 0.3802
From symmetry, using result (d) on page 366,
(c) P(Z >a)~ 0.7367
(d) P(Z <a)~ 0.0793 2<1>(a)- 1 ~ 0.9
2<1>(a) ~ 1.9
<l>(a) ~ 0.95
Solution 7.6 a~ <1>- 1 (0.95)
(a) P(Z <a) ~ 0.9693 P(Z <a)= 0.9693 ~ 1.645

i.e. <!>(a)~ 0.9693 This me~ns that the central 90% of the standard normal
a~ <1>- 1 (0.9693) d1stnbut10n lies between ±1.645.
~ 1.87 -a 0 a
I Alternatively,
0 a
(b) P(Z >a)~ 0.3802
If P(-a < Z <a)~ 0.9 I
I
i.e. 1- <I>( a)~ 0.3802 t~en the value of a corresponds to an upper tail probability <P(a) = 0.95 1
so <I>( a)~ 1- 0.3802 o 0.05, and a lower tall probability of 0.95. I
I
~ 0.6198 P(Z <a) = 0.3802 ·· ~
<l>(a) 0.95 I
a~ <1>- 1 (0.6198) a~ <1>- 1(0.95) I
~ 0.305 ~ 1.645 0 a
0 a
USING THE TABLES IN REVERSE FOR ANY NORMAL VARIABLE X Example 7.9
The marks of 500 candidates in an examination are normally distributed with a mean of
Example 7.8 . f 45 marks and a standard deviation of 20 marks.
The heights of female students at a particular co 11ege are norma 11y distributed wtth a mean o (a) Given that the pass mark is 41, estimate the number of candidates who passed the
169 em and a standard deviation of 9 em. . 1 fh examination.
h h . ht 1 than h em fmd the va ue o . (b) If 5% of the candidates obtain a distinction by scoring x marks or more, estimate the
(a) Given that 80% of these female students ave a etg ess h ' find the value
(b) Given that 60% of these female students have a height greater t an s em, value of x.
of s. (c) Estimate the interquartile range of the distribution. (L Additional)

Solution 7.8 Solution 7.9


X is the height, in centimetres, of a female student. X is the examination mark.
X- N(169, 9 2 ) X- N(45, 20 2 )
(a) Given P(X <h)~ 0.8
(a) P(X > 41)~P Z > 41-45)
Standardising ( 20
r(z < h-;69)~o.8 ~P(Z>-0.2)
~ <1>(0.2) 4145
~ 0.5793 -0.2 0
h-169
Let z x, 169 h .. P(Pass) ~ 0.5793
9 z, 0 z
Since there are 500 candidates, to find the number of candidates who pass, multiply the
P(Z < z) ~ 0.8 probability by 500.
z ~ <1>- 1(0.8)
~ 0.842 5QQ X 0.5793 ~ 289.65
h-169 290 candidates passed.
0.842
9 (b) P(X>x)~0.05
h ~ 169 + 9 X 0.842
~ 176.38 Writing z for the standardised value of x,
~ 176.4 (1 d.p.) x-45
P(Z>z)~0.05 wherez~2Q
(b) Given P(X > s) ~ 0.6
Standardising <l>(z)
1 - 0.05
~
x,
r(z 0.95 ~ 45 X

s-: )~0.6 96 z, 0 z
> z ~ <1>- 1(0.95)
~ 1.645
s-196 x, s 169 x-45
Let z z, z 0 --~1.645
9 20
P(Z > z)~0.6 X~ 45 + 1.645 X 20
~ 78 (2 s.f.)
z must be negative
and <1>(-z) ~ 0.6 A distinction is awarded for a mark of 78 or more.
-z ~ <1>- 1 (0.6) (c) The interquartile range encloses the central 50% of the distribution between the lower
~ 0.253 quartile q 1, and upper quartile, q 3 .
z~ -0.253

s -169 ~ -0.253 If P(-z < Z < z) ~50% then z corresponds to an upper


9 tail probability of 25%.
5 ~169-9 X 0.253
So <l>(z) ~ 0.75
~ 166.723
~ 166.7 (1 d.p.) z~<l>- 1 (0.75)
x, ~ 0.674
z,
!i

5. Find x in each of the following. 8. A sample of 100 apples is taken from a load.
Now z is the standardised value of the upper quartile, q '' (a) X- N(60, 25) The apples have the following distribution of sizes

so q,- 45 0.674 Diameter to nearest em 6 7 8 9 10


20 Frequency 11 21 38 17 13
q 3 =45 +20 X 0.674
= 58.48 Determine the mean and standard deviation of
these diameters.
Lower quartile q 1 is such that q 1 - 45 --0.674 Assuming that the distribution is approximately
60 ' normal with this mean and this standard
20
q 1 = 45 - 20 X 0.674 (b) X- N(S, j) deviation find the range of size of apples for
packing, if 5% are to be rejected as too small
= 31.52
and 5% are to be rejected as too large. (0 & C)
P(X<x)=03
Interquartile range = q 3 - q 1 9. X- N(400, 64).
=58.48- 31.52 (a) Find the limits within which the central
= 27 (2 s.f.) 95% of the distribution lies.
(b) Find the interquartile range ofthe
' 5 distribution.
(c) X- N(200, 36)
10. The lengths of metal strips are normally
Exercise 7c Using the standard normal tables in reverse distributed with a mean of 120 em and a
standard deviation of 10 em. Find the probability
1. In the following, find the value of z; where z ~ N(O, 1). that a strip selected at random has a length
(f) (a) greater than 105 em,
(a) P(l :> z} = 0.154
P(Z < zl = 0.506 (b) within 5 em of the mean.
Strips that are shorter than L em are rejected.
X 200 Estimate the value of L, correct to one decimal
place, if 9% or all strips are rejected.
(d) X- N(O, 4)
In a sample of 500 strips, estimate the number
0 having a length over 126 em. (C)
0 '
(b) (g)
11. The numbers of shirts sold in a week by the
P(ill < z) = 0.6
P{Z < z) = 0.787 world's largest menswear store are normally
distributed with a mean of 2080 and a standard
deviation of 50. Estimate
(a) the probability that in a given week fewer
0 '
_, 0 than 2000 shirts are sold,
6. Bags of flour packed by a particular machine (b) the number of weeks in a year that between
(c) have masses which are normally distributed with 2060 and 213() shirts are sold,
2. Z- N(O, 1).
P(Z < z) = 0.0296 mean 500 g and standard deviation 20 g. (c) the interquartile range of the distribution,
Find the value of a, where 2% of the bags are rejected for being (d) the le<ist riumber·n of shirts such that the
(a) P(Z <a) = 0.9738 underweight and 1% of the bags are rejected for probability that more than n are sold in a
(b) P(Z <a)= 0.2435 being overweight. Between what range of values given week is less than 0.02. (C)
(c) P(Z >a)= 0.82 should the mass of a bag of flour lie if it is to be
(d) P(Z >a)= 0.2351 accepted? 12. Batteries for a transistor radio have a mean life
0 under normal usage of 160 hours, with a
(d) 3. Z- N(O, 1). 7. The masses of cos lettuces sold at a market are standard deviation of 30 hours. Assuming the
P(Z < zl = 0.325
normally distributed with mean mass 600 g and battery life follows a normal distribution,
Find a if standard deviation 20 g.
(a) calculate the percentage of batteries which
(a) P( I Z I< a)= 0.6372, (a) If a lettuce is chosen at random, find the have a life between 150 hours and
(b) P(IZI>a)=0.097, probability that its mass lies between 570 g 180 hours,
(c) P(IZI<a)=O.S, and 610 g. {b) calculate the range, symmetrical about the
(d) P( 1Z 1>a)= 0.0404. (b) Find the mass exceeded by 7% of the mean, within which 75% of the battery lives
lettuces. lie.
(e) 4. If z ~ N(O, 1), find the upper quarFti.ledandl ot~he (c) In one day, 1000 lettuces are sold.
P(Z > zl = 0.713
lower quartile of the distribution. m a 5 If a radio takes four of these batteries and
70th percentile. Estimate how many weigh less than 545 g. requires all of them to be working, calculate
(c) the probability that the radio will run for at
0.713 least 135 hours. ( 0 & C)
Example 7.12
FINDING THE VALUE OF p OR a OR BOTH
The masses of boxes of oranges are normally distributed such that 30% of them are greater
Example 7.10 than 4.00 kg and 20% are greater than 4.53 kg. Estimate the mean and standard deviation of
the masses. (C)
The lengths of certain items follow a normal distribution with mean I" em and standard
deviation 6 em. It is known that 4.78% of the items have length greater than 82 em. Find the Solution 7.12
value of the mean I"·
X is the mass, in kilograms, of a box of oranges.
X- N(f<, a 2 ) where f' and u are to be found.
Solution 7.10
P(X > 4.00)0.3 ~
X is the length, in centimetres, of an item. P(Z > z)0.3 ~
X- N(p, 6 2 ) and P(X > 82) ~ 0.0478
where z ~ 4 ·00 - f'
Since P(X > 82) is less than 0.5, 82 must be greater than f'· a
¢(z) ~ 1-0.3
P(X > 82) ~ P(Z > z) where ~ 0.7

/i
82 -I" z~ rp- 1 (0.7)
z~--
6 ~ 0.524
so P(Z > z) ~ 0.0478 4.00-1"
<l>(z) ~ 1- 0.0478 0.524
(5

~ 0.9522 4.00- f' ~ 0.524u X: 11 4.00


z ~ <1>- 1 (0.9522) x. !I 82
z, 0 z 4.00 ~ f' + 0.524a ...... CD Z: 0 z
~ 1.667
P(X > 4.53) ~ 0.2
82
.. -I"~ 1.667 P(Z > z) ~ 0.2
6
82-ft~ 1.667 X 6
where z 4 ·53 -p
a
ft ~ 82- 1.667 X 6 ~ 71.998
¢(z) ~ 1-0.2
The mean, p ~ 72 em (2 s.f.)
~ 0.8
z ~ q,- 1 (0.8)
~ 0.842
Example 7.11 _4_.5_:_3_-~f' ~ 0. 84 2
X- N(lOO, a 2 ) and P(X < 106) ~ 0.8849. a
Find the value of the standard deviation a. 4.53- f' ~ 0.842a X: f) 4.53
4.53 ~ f' + 0.842u ...... @ Z: 0 z

Solution 7.11 Equation@- equation CD gives


P(X < 106) ~0.8849 0.53 ~ 0.318a
so P(Z < z) ~ 0.8849 u ~ 1.666 ... ~ 1.67 (3 s.f.)
106-100 6
where z Substituting in equation CD
a a
<l>(z) ~ 0.8849 4.00 ~p + 0.524 X 1.666 ...
z ~ <1>- 1(0.8849) x. 100 106 I"~ 3.126 ... ~ 3.13 (3 s.f.).
~ 1.2
z, 0 z
·· p ~ 3.13 kg and a~ 1.67 kg
6
.. -~ 1.2
a
6
a~~~5
1.2
The standard deviation, a ~ 5
---------~~~~~--~--

flOf\\,..1.6.1__ DISTf~!f3UTICf\J 381

Example 7.13 Exercise 7cl Finding .u oro or both, where X-~ N(,u, a")
The speeds of cars passing a certain point on a motorway can be taken to be normally You are advised to draw sketches to illustrate your answers.
distributed. Observations show that of cars passing the point, 95% are travellmg at less than
1. The random variable X is distributed N(J.J., a 2 ), 10. The marks in an examination were found to be
85 m.p.h. and 10% are travelling at less than 55 m.p.h. witha=25. normally distributed.
If P(X < 27.5) ~ 0.3085, find the value of I'· 10% of the candidates were awarded a
(a) Find the average speed of the cars passing the point. distinction for obtaining over 75.
(b) Find the proportion of cars that travel at more than 70 m.p.h. (L) 2. The random variable X is normally distributed 20% of the candidates failed the examination
with a mean of 45. The probability that X is with a mark of under 40. Find the mean and
greater than 51 is 0.288. Find the standard standard deviation of the distribution of marks.
Solution 7.13 deviation of the distribution.
11. A farmer cuts hazel twigs to make into bean
X is the speed, in m.p.h., of a car passing a certain point. 3. The volumes of drinks in cans are normally poles to sell at the market. He says that a stick is
distributed with a mean of 333 ml.
X -N{ft,a 2 ). 240 em long. In fact the lengths of the sticks are

k
Given that 20% of the cans contain more than normally distributed and 55% are over 240 em
(a) P(X < 85) ~ 0.95 0.95 340 ml, find the standard deviation of the long. 10% are over 250 em long.
volume of drink in a can. Find also the Find the probability that a randomly selected
85- fl percentage of cans that contain less than than
i.e. P(Z < z 1) ~ 0.95 where Z 1 ~ ~~ stick is shorter than 235 em.
a 330 ml.
z1 ~ q,- 1(0.95) x, I' 85 12. The diameters of bolts produced by a particular
~ 1.645
z, 0 z, 4. The random variable X is distributed N{!J., 12) machine follow a normal distribution with mean
and it is known that P(X > 32) = 0.8438. Find 1.34 em and standard deviation 0.04 em. A bolt
85- fl ~ 1.645 85 ~ f' + 1.645a ... ®
the value of f-1· is rejected if its diameter is less than 1.24 em or
more than 1.40 em. Find the percentage of bolts
a 5. The heights, measured in metres, of 500 people which are accepted.
are normally distributed with a standard The setting of the machine is altered so that the
P(X <55)~ 0.1 deviation of 0.080 m. Given that the heights of mean diameter changes but the standard
55- fl 129 of these people are greater than the mean deviation remains the same. With the new
i.e. P(Z < z 2 ) ~ 0.1 where z2 = --- height, but less than 1.806 m, estimate the mean setting, 3% of the bolts are rejected because they
a height. (C) are too large in diameter. Find the new mean
q'>(-z 2 ) ~ 0.9 diameter of bolts produced by the machine. Find
6. The random variable X is distributed N(J.i., a 2 ). also the percentage of bolts that are now rejected
-z, ~ q,- 1 (0.9) P(X> 80) ~ 0.0113 and P(X < 30) ~ 0.0287.
X: 55 p because they are too small in diameter.
~ 1.282 Findp and a.

-
Z: z2 0
. . z, ~ -1.282 13. Tea is sold in packages marked 750 g. The
7. The masses of boxes of apples are normally masses of the packages are normally distributed
55 -r< ~-1.282
55~ r<- 1.282a ... @ distributed such that 20% of them are greater with a mean of 760 g. It is known that less
()" than 5.08 kg and 15% are greater than 5.62 kg. than 1% of packages are underweight. What is
Estimate the mean and standard deviation of the the maximum value of the standard deviation of
CD-@ gives 30 ~ 2.927a masses. the distribution?
a~ 10.249 ...
Substituting in CD 85 ~ p + 1.645 x 10.249 ... 8. Metal rods produced by a machine have lengths 14. The random variable X is normally distributed.
that are normally distributed. The probability that X is less than 53 is 0.04 and
fl ~ 68.139 ... 2% of the rods are rejected as being too short the probability that X is less than 65 is 0.97.
and 5% are rejected as being too long. Find the interquartile range of the distribution.
The average speed is 68 m.p.h. (2 s.f.).
70-68.13 ... ) :\ (a) Given that the least and greatest acceptable
l~ngths of the rods are 6.32 em and 7.52 em, 15. A certain make of car tyre can be safely used for
(b) P(Z > 70)~P ( Z > _ ... I \ calculate the mean and variance of the 25 000 km on average before it is replaced. The
10 24
~ P(Z > 0.1815) 1\ I

: ~~~
' lengths of the rods.
(b) If ten rods are chosen at random from a
batch produced by the machine, find the
makers guarantee to pay compensation to
anyone whose tyre does not last for 22 000 km.
They expect 7.5% of all tyres sold to qualify for
~ 1- ¢(0.1815) I compensation. Assuming that the distance X
probability that exactly three of them are
~ 1-0.5718 6K14 70 tra veiled before a tyre is replaced has a normal
0 0.1815 rejected as being too long.
~ 0.4282
probability distribution, draw a diagram
9. The random variable X is distributed N(j.J., a 2 ). illustrating the facts given above.
The proportion of cars travelling at more than 70 m.p.h. is 0.43 (2 s.f.). P(X < 35) ~ 0.2 and P(35 <X< 45) ~ 0.65. Calculate, to three significant figures, the
Find ft and a. standard deviation of X.
Estimate the number of tyres per 1000 which
will not have been replaced when they have
covered 26 500 km. (L Additional)
fl-1!:. i\Of?f··.J1/-\L DIST'f\IBUTION 383
382 t\ CONCIS[ COURSf:_ il\i .A,-l_E\iF~L- ST.tJ!STiCS

(c) Later the machine requires a major repair, (a) n ~ 5


16. Two firms, Goodline and Megadelay, produce following which the mean weight of peanuts
delay lines for use in communications. The delay in a bag is 128.3 g, and 4% of bags contain >< X- 8(5, 0.2) X- 8(5, 0.5)
time for a delay line is measured in nanoseconds less than 126 g. Find, to three significant " """
(ns).
(a) The delay times for the output of Goodline
may be modelled by a normal distribution
figures, the standard deviation of the weight
of peanuts in a bag after this major repair.
(NEAB)
" \
"
Q

/(fl'h
\
with mean 283 ns and standard deviation
8 ns. What is the probability that the delay
time of one line selected at random from
Goodline's output is between 275 ns and
286 ns?
19. A machine is used to fill cans of soup with a
nominal volume of 0.450 litres. Suppose that the
machine delivers a quantity of soup which is
normally distributed with mean fl litres and
standard deviation a litres. Given that fl = 0.457
r
0
I''I'~
2 3 4 5
' 0 1 2 3 4 5
(b) It is found that, in the output of Megadelay, (b) n~ 12
10% of the delay times are less than and a= 0.004, find the probability that a
randomly chosen can will contain less than the X- 8(12, 0.2) X- 8(12, 0.5)
274.6 ns and 7.5% are more than 288.2 ns.
Again assuming a normal distribution, nominal volume.
""" """
calculate the mean and standard deviation It is required by law that no more than 1% of
" "
Q

l
of the delay times for Megadelay. Give your cans contain less than the nominal volume.
answers correct to three significant figures.
(C)
Find
' 'h, I

,A1'f h,l
(a) the least value of p which will comply with
the law if a= 0.004,
17. Machine components arc mass-produced at a
factory. A customer requires that the
(b) the greatest value of a which will comply
with the law if I'~ 0.457. (MEl) 0 2 4 6
~ '
8
~ .
' "
10 12 X 0 2 4 6 8 10 12 X
components should be 5.2 em long but they will
be acceptable if they are within limits 5.195 em 20. The masses of packets of sugar are normally (c) n~20
to 5.205 em. The customer tests the components distributed. In a large consignment of packets of
and finds that 10.75% of those supplied are >< X- 8(20, 0.2) X- 8(20, 0.5}
over-size and 4.95% are under-size. Find the
sugar, it is found that 5% of them have a mass
" """
mean and standard deviation of the lengths of
the components supplied, assuming that they are
normally distributed.
greater than 510 g and 2% have a mass greater
than 515 g. Estimate the mean and the standard
deviation of this distribution. (C)
" "
If three of the components are selected at
random what is the probability that one is
21. On a particular day, 50% of the employees in a
large company had arrived at work by 8.30 a.m., /1/ 1Ill
under-size, one is over-size and one satisfactory? and l0%1 had not arrived by 8.55 a.m.
(a) Assuming a normal model, find the standard
( lh.
0 2 4 6 8 10 0 3 10 17 20 X
18. A machine dispenses peanuts into bags so that
the weight of peanuts in a bag is normally deviation of the arrival times, in minutes. Notice
(b) It is given that only 5% of the employees
distributed.
had arrived by 8.05 a.m. Without further ., when p ~ 0.5, the distributions are symmetric and f 1
(a) Initially the mean weight of peanuts in a bag calculation, explain why this might suggest takes on the characteristic nonnal shape, ' or arger values of n, the distribution
is 128.5 g and the standard deviation is that a normal model is not appropriate.
" when P"" 0.2, the distribution is positively skewed for small values of
~ 20,
1.5 g. Find the probability that the weight of (c) Eighty employees are selected at random.
peanuts in a randomly chosen bag exceeds Find the expectation of the number of these the dtstnbutwn ts almost symmetrical and bell-shaped. n, but when n
130 g. employees that arrived between 8.30 and
(b) The machine is given a minor overhaul that For the discrete random variable X, distributed binomial! wh X- .
changes the mean weight, p, of peanuts in a
8.55 a.m. (C)
~ E(X) ~ np and the variance
!'
· a2-- Va r (X) ~ npq (see page
y 286).
ere B(n, p), the mean
bag without affecting the standard

:~i:~;e i~~~r~::~~ ~si:dn~: ~:oa~;r~~:;ti~nat~~~~~~:~:~u~~7t~i:~:~o:,ean np and


deviation. Following the overhaul, 14% of
bags contain more than 130 g of peanuts.
Find, to four significant figures, the new
value for p. A rule that can be used is as follows:
If P.) a.nd
lliJq,! 11 and fJ arc such that 11f1 ::. "'tld
t- ----..... ,, "" 11, 1 ,>" c
,J! \-V 11CfC q co·· 1 -- p_, then
X
THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION
Under certain circumstances the normal distribution can be used as an approximation to the
binomial distribution. One practical advantage is tbat the calculations for finding probabilities
CONTINUITY CORRECTIONS
The following example compares probabilities obtain d . b" . . . .
normal approximation. It also illustrates the use of a :on~~n~ a mmmal dtstrtbutwn and a
are much less tedious to perform.
The diagrams opposite illustrate the distribution B(n, p) for p ~ 0.2 and p ~ 0.5, for various a contmuous distribution (the normal) . . utty cor~ectlon, needed when usmg
values of n. In each case a vertical line graph has been drawn, and to make comparison easier, binomial). as an approximatiOn for a discrete distribution (the
a curve has been superimposed on each.
T'H[ r-IOF\i\'t,;,L D!STf)IHUTiCI'I 385

Writing the symbol-----* to represent 'transforms to',


Example 7.14 X- N(6, 3)
P(4 <:X<: 7) --7 P(3.5 <X< 7.5)
Find the probability of obtaining 4, 5, 6 or 7 heads when a fair coin is tossed 12 times
3.5 -6 7.5 -6)
~P---<Z<--
(a) using the binomial distribution, ( -13 -13
(b) using a normal approximation to the binomial distribution.
~ P(-1.433 < Z < 0.866)
~ <!>(1.433) + <!>(0.866) -1 3.5 6 7.5
Solution 7.14 ~ 0. 732 (3 d.p.) -1.443 0 0,866

X is the number of heads in 12 tosses. Note that the probabilities found by the two different methods compare well and the working
Since the coin is fair, P(head) ~ 0.5, so X- B(12, 0.5). for part (b) is quicker to perform. The approximation is good because, although n is not very
large, p ~ 0.5.
(a) Using the binomial distribution,
P(X ~ 4) ~ 12 C4(0.5) 8 X (0.5) 4 ~ 12 C4(0.5) 12 ~ 0.1208 ...
P(X ~ 5) ~ 12 C5 (0.5) 12 ~ 0.1933 .. . More about continuity corrections
P(X ~ 6) ~ 12 C6 (0.5) 12 ~ 0.2255 .. . Continuity corrections sometimes cause difficulties, so these are considered in more detail,
P(X ~ 7) ~ 12 C7 (0.5) 12 ~ 0.1933 .. . using the diagram for the distribution of the number of heads when a coin is tossed 12 times.
P(4<:X0)~0.733 (3 d.p.) If you want the probability that there are three heads or fewer,
i.e. P(X <: 3 ), then consider P(X < 3.5).
(b) The diagram below shows the probability distribution for X- B(12, 0.5). Note that the
vertical lines have been replaced by rectangles to help illustrate the intention to use a
continuous distribution as an approximation for a discrete one. The required binomial
probability is represented by the sum of the areas of the shaded rectangles.
0 1 2 3 4
P(X ,-;; 3) rectangle for 3 included
First check the conditions for a normal approximation:
np ~ 12 x 0.5 ~ 6, so np > 5 If you want the probability that there are fewer than three
nq ~ 12 x 0.5 ~ 6, so nq > 5 heads, i.e. P(X < 3), then consider I'( X< 2.5).
Since np > 5 and nq > 5, use the normal approximation
X- N(np, npq) with np ~ 6, npq ~ 12 x 0.5 x 0.5 ~ 3
0 l 2 3 4
So X- N(6, 3). P(X < 3) rectangle for 3 not included

Superimposing the curve which is approximately N(6, 3), the probability of obtaining 4, If you want the probability that there are exactly three heads,
5, 6 or 7 heads is found by considering the area under this normal curve from x ~ 3.5 to i.e. P(X ~ 3), then consider P(2.5 <X< 3.5).
X~ 7.5.

Consider these further examples.


P(5 <:X<: 8) --7 P(4.5 <X< 8.5) (5, 6, 7 or 8 heads)
P(5 <X<: 8) --7 P(5.5 <X< 8.5) (6, 7 or 8 heads)
1'(5 <:X< 8) --7 1'(4.5 <X< 7.5) (5, 6 or 7 heads)
1'(5 <X< 8) --7 P(5.5 <X< 7.5) (6 or 7 heads)
P(X < 4) --7 P(X < 3.5) (0, 1, 2, 3 heads)
Number of heads
P(X.; 4) _, P(X < 4.5) (0, 1, 2, 3 or 4 heads)
3.5 7.5
P(X;;. 4) --7 P(X > 3.5) (4, 5, 6, ... , l2 heads)
P(4.; X.; 7) transforms to P(3.5 <X< 7.5) using a continuity correction. P(X > 4) --7 P(X > 4.5) (5, 6, 7, ... , 12 heads)
I'( X~ 9) --7 1'(8.5 <X< 9.5) (9 heads)
P(X ~ 7) --7 P(6.5 <X< 7.5) (7 heads)
P(X;;. 0) --7 P(X > -0.5) (0, 1, 2, ... , 12 heads)
P(X > 0) --7 P(X > 0.5) (1, 2, 3, ... , 12 heads)
P(X ~ 0) --7 P(-0.5 <X< 0.5) (0 heads)
386 ,A. CONCISr~ COURSC_If\J /\-Lf:\ffl_ STAT IS !CS flit~ f\JORf-.;1/\l_ DiSTRiBUTJ(}\J 387

Exercise 7 e Continuity corrections (c) P(X > 160)--> P(X > 160.5) (continuity correction)

P(X > 160.5) ~r(z >


Write down the transformations for each of the following, when a normal distribution is to be used as an 160 5
· -l40)
approximation for a binomial distribution.
1. P(3<XO) 11. P(400 <X <560)
m
~ P(Z > 2.149)
2. P(3<X<9) 12. P(X~67) ~ 1 - <I>(2.l49)
13. P(X>59) 140 160.5
3. P(10<X~24) ~ 0.0158 0 2.149
4. P(2.;; X <8) 14. P(X~100)
5. P(X> 54) 15. P(34<X<43) The probability that there are more than 160 ryegrass seeds is 0.016 (2 s.f.).
6. P(X:> 76) 16. P(X ~ 7)
NOTE: You should define X as binomial, then check that the conditions are suitable before
7. P(45<X<67) 17. P(X :> 509) defining the approximate normal distribution.
8. P(X < 109) 18. P(X <7)
9. P(X~45) 19. P(27 ~X< 29)
10. P(X ~56) 20. P(X~53)
Example 7.16

Example 7.15 It is given that 40% of the population support the Gamboge Party. One hundred and fifty
members of the populatiOn are selected at random. Use a suitable approximation to find the
In a sack of mixed grass seeds, the probability that a seed is ryegrass is 0.35. probability that more than 55 out of the 150 support the Gamboge Party. (C)
Find the probability that in a random sample of 400 seeds from the sack,
(a) less than 120 are ryegrass seeds, Solution 7.16
(b) between 120 and 150 (inclusive) are ryegrass,
(c) more than 160 are ryegrass seeds. X is the number in 150 who support the Gamboge Party.
n ~ 150, p ~ 0.4, q ~ 0.6
Solution 7.15 so X- B(n, p) with n ~ 150, p ~ 0.4, q ~ 0.6

X is the number of ryegrass seeds in a sample of 400 seeds. Check np and nq:
n ~ 400, p ~ 0.35, q ~ 0.65, so X- B(400, 0.35) np ~ 150 x 0.4 ~ 60, nq ~ 150 x 0.6 ~ 90
To see whether a normal approximation is suitable, check the value of np and nq:
Since np > 5 and nq > 5, use the normal approximation
np ~ 400 x 0.35 ~ 140 and nq ~ 400 x 0.65 ~ 260. X- N(np, npq) with np ~ 60, npq ~ 150 x 0.4 x 0.6 ~ 36
Since np > 5 and nq < 5, use the normal approximation So X- N(60, 36)

X- N(np, npq) with np ~ 140, npq ~ 400 x 0.35 x 0.65 ~ 91 P(X >55)--> P(X > 55.5) (continuity correction)
So X- N(140, 91) ~ r(z > 55.5 - 60)
6
(a) P(X < 120)--> P(X < 119.5) (continuity correction)

A
~ P(Z > -0.75)
119.5- 140) ~ <I>(0.75)
P(X < 119.5)~P Z <
(
19'1
~ 0.7734
X: 55.5 60
~ P(Z < -2.149) ~ 0.77 (2 s.f.) Z: -0.75 0
~ 1 - <I>(2.149) X: 119.5 140
Z: -2.149 0
~ 0.0158

The probability that there are less than 120 ryegrass seeds is 0.016 (2 s.f.). DECIDING WHEN TO USE A NORMAL APPROXIMATION AND WHEN TO
(b) P(120 (X ( 150)--> P(119.5 <X< 150.5) (continuity correction) USE A POISSON APPROXIMATION FOR A BINOMIAL DISTRIBUTION
119.5 -140 150.5 -140) For X·· p)
P(119.5 <X< 150.5)~P ( 19'1 <Z< 19'1
® a Poisson approximation can be used \-\'hen n is large 1:n >50) and pis small (jJ < 0.1 ).
~ P(-2.149 < Z < 1.101) Then X·~ Po(np) anuroxnnarrl\'
~ <I>(2.149) + <I>(1.101) - 1 X: 119.5 140 150.5
~ 0.8487
Z: -2.149 0 1.101 ® ;.J normal a-pproximation can be used \vhcn n and p arc such that np > 5 and nq > 5.
Then X.~. npq) approximately.
The probability that there are between 120 and 150 ryegrass seeds is 0.85 (2 s.f.).
389

Example 7.17 7f normal approx; norr:ial


A number of different types of fungi are distributed at random in a field. Eighty % of these 1. An ordinary unbiased die is thrown 120 times. (b) exactly 20 in a random sample of
fungi are mushrooms, and the remainder are toadstools. Five % of the toadstools are Using a suitable approximation, find the 100 people,
probability of obtaining at least 24 sixes.
poisonous. A man, who cannot distinguish between mushrooms and toad~to~l~, wan?ers (c) more than 200 in a random sample of
1000 people.
across the field and picks a total of 100 fungi. Determine, correct to two s1gmftcant ftgures, 2. State conditions under which the distribution
B(n, p) can be approximated by a normal
using appropriate approximations, the probability that the man has picked distribution. 8. It is estimated that one-fifth of the population of
England watched last year's Cup Final on
The random variable X has the distribution
(a) at least 20 toadstools, B(25, 0.38).
television. If random samples of 100 people are
(b) exactly two poisonous toadstools. (C) interviewed, calculate the mean and variance of
(a) Verify that the distribution can be the number of people from these samples who
approximated by a normal distribution. watched the Cup Final on television.
Solution 7.17 (b) Use the normal approximation to calculate Estimate, to two significant figures, the
the probability that X takes the values 15, probability that in a random sample of 100
P(mushroom) ~ 0.8, P(toadstool) ~ 0.2, P(poisonous toadstool)~ 0.05 x 0.2 ~ 0.01 16, 17, 18 or 19. people, more than 30 watched the Cup Final on
(c) Use the normal approximation to calculate television. (L Additional)
(a) X is the number of toadstools picked in a sample of 100. the probability that X takes the value 12. (C)
X~ B(100, 0.2). 9. In a series of n independent trials, the probability
3. 10% of the chocolates produced in a factory are of a success at each trial is p. If R is the random
np ~ 100 x 0.2 ~ 20, nq ~ 100 x 0.8 ~ 80 mis-shapes. A random sample of 1000 chocolates variable denoting the total number of successes,
is taken. Find the probability that state the probability that R = r. State also the
Since np > 5, nq > 5, use a normal approximation
(a) less than 80 are mis-shapes, mean and variance of R.
with mean ~ np ~ 20, (b) between 90 and 115 (inclusive) are A certain variety of flower seed is sold in packets
variance~ npq ~ 100 x 0.2 x 0.8 ~ 16. mis-shapes, containing 1000 seeds. It is claimed on the
(c) 120 or more are mis-shapes. packet that 40% will bloom white and 60% will
X~ N(20, 16). bloom red. This may be assumed to be accurate.
4. When I try to send a fax, the probability that I Five seeds are planted. Find the probability that
P(X :> 20)-> P(X > 19.5) can successfully send it is 0.85.
{a) exactly three will bloom white,
~P Z>-- 19.5- -20)
-
(a) I try to send eight faxes. Use a binomial (b) at least one will bloom white.
( 4 model to find the probability that I can
successfully send at least seven of the faxes. 100 seeds are planted. Use the normal
~ P(Z > -0.125) {b) I try to send 50 faxes. Use a normal approximation to estimate the probability of
~ <1>(0.125) approximation to the binomial model to obtaining between 30 and 45 (inclusive) white
find the probability that I can successfully flowers.
~ 0.5498 x. 19.5 20
send at least 45 faxes. {C)
Z: -0.125 0
~ 0.55 (2 s.f.) 10. A certain tribe is distinguished by the fact that
5. At a particular hospital, records show that each 45% of the males have six toes on their right
(b) X is the number of poisonous toadstools in 100 fungi. day, on average, only 80% of people keep their foot. Find the probability that, in a group of 200
X~ B(lOO, 0.01) appointment at the outpatients' clinic. males from the tribe, more than 97 have six toes
Find the probability that on a day when 200 on their right foot.
np ~ 100 x 0.01 ~ 1 appointments have been booked,
(a) more than 170 patients keep their 11. A lorry load of potatoes has, on average, one
Use a Poisson distribution, since n >50, p < 0.1. appointments, rotten potato in six. A greengrocer decides to
X~ Po(1) {b) at least 155 patients keep their refuse the consignment if she finds more than 1.8
appointments. rotten potatoes in a random sample of 1.00. Find
12 the probability that she accepts the consignment.
P(X~2)~e- 1 - 6. The random variable X is distributed
2! B(200, 0. 7). 12. State conditions under which a binomial
~ 0.1839 ... Use the normal approximation to the binomial probability model can be well approximated by a
~ 0.18 (2 s.f.) distribution to find normal model.
X is a random variable with the distribution
(a) I'( X;;. 130),
B(12, 0.42).
(b) P(136.;; X <148),
(c) P(X < 142), (a) Anne uses the binomial distribution to
(d) P(X ~ 152). calculate the probability that X< 4 and
gives four significant figures in her answer.
7. One-fifth of a given population has a minor eye What answer should she get?
defect. Use the normal distribution as an (b) Ben uses a normal distribution to calculate
approximation to the binomial distribution to an approximation for the probability that
estimate the probability that the number of X< 4 and gives four significant figures in his
people with the defect is answer. What answer should he get?
(a) more than 20 in a random sample of {c) Given that Ben's working is correct,
100 people, calculate the percentage error in his answer.
(C)
7. In an experiment with a radioactive substance
THE NORMAL APPROXIMATION TO THE POISSON DISTRIBUTION the number of particles reaching a counter over a
(a) State two conditions which are necessary for
the Poisson distribution to be a suitable
given period of time follows a Poisson
lf X foHo\\'S a .Poisson distribution \'l.rith parameter IL i.e, X~~
1
distribution with mean 22. Find the probability
model for the number of weevils which
would be found in a given volume of grain.
then --- X and ""~ )~ that the number of particles reaching the counter
over the given period of time is Assume that the Poisson distribution can be used
;t>l the normal distribution can be used as an \Vhcrc in this case.
(a) less than 22,
X,~ (b) between 25 and 30, (b) ~ind, to three decimal places,
{c) 18 or more. {1) the probability that 1litre of grain
As with the normal approximation to the binomial distribution, a continuity correction is contains at least one weevil
needed, since you are using a continuous distribution as an approximation to a discrete one. 8. The number of accidents on a certain railway (ii) the p~obability that 4 litres ~f grain
line occur at an average rate of one every two contam exactly ten weevils.
months. Find the probability that {c) Use an appropriate distributional
Example 7.18 (a) there are 25 or more accidents in four years, approximation to estimate the probability
A radioactive disintegration gives counts that follow a Poisson distribution with a mean count (b) there are 30 or fewer accidents in five years. that 10 litres of grain contain fewer than 25
weevils, giving your answer to three decimal
of 25 per second. 9. The number of eggs laid by an insect follows a places. (NEAB)
Find the probability that in a one,second interval the count is between 23 and 2 7 inclusive. Poisson distribution with parameter 200.
13. A biologist gathers leaves of a certain plant in
{a) Find the probability that
order to c~llect insects of a particular type. From
Solution 7.18 (i) more than 170 eggs are laid,
past expenence she knows that the distribution of
(ii) more than 205 eggs are laid,
the number of insects on n leaves may be modelled
X is the radioactive count in a one-second interval. (iii) between 180 and 240 eggs (inclusive)
by a Poisson distribution with mean 0.8n.
are laid.
X- Po(25) (a) Calculate, to three decimal places, the
(b) If the probability that an egg develops is 0.1,
E(X) ~ 25, Var(X) ~ 25 show that the number of survivors follows a probability that the number of insects on the
Poisson distribution with parameter 20 and next leaf to be examined will be fewer than
Using a normal approximation, find the probability that there arc more than three.
30 survivors. (b) Determine, to three decimal places, the
X -N(25,25)
probability that the total number of insects
P(23 <;X.; 27)--> P(22.5 <X< 27.5) (continuity correction) 10. When a trainee typist types a document the on the next ten leaves to be examined will
number of mistakes made on any one page is a lie between six and 12 (both inclusive).
22.5- 25 27.5- 25)
~P ( 5 <Z< 5 Poisson variable with mean 3, independently of {c) Use a distr~butional approximation to find,
the number of mistakes made on any other page. to two decimal places, the probability that
Use tables, or otherwise, to find, to three the total number of insects on the next
~
P(-0.5 < Z < 0.5) significant figures, 50 leaves to be examined will exceed 45.
2<1>(0.5) - 1
~ 22.5 25 27.5 (NEAB)
(a) the probability that the number of mistakes
~ 0.383 (3 d.p.) -0.5 0 0.5
on the first page is less than two, 14. (a) Give two conditions which must apply when
NOTE: this compares very well with the value given using the Poisson distribution. (b) the probability that the number of mistakes
modelling a random variable by a Poisson
on the first page is more than four. distribution.
Check if for yourself.
When the typist types a 48-page document the A particular make of kettle is sold by a shop at
total number of mistakes made by the typist is a an average rate of five per week. The random
Poisson variable with mean 144.
variable X represents the number of kettles sold
Exercise 7g The normal approximation to the Poisson distribution Use a suitable approximate method to find, to in any one week and .X is modelled by a Poisson
three decimal places, the probability that this total distribution.
1. If X~ Po(24), use the normal approximation to 4. The number of calls per hour received by an number of mistakes is greater than 130. (NEAB) The shop manager notices that at the beginning of
find office switchboard follows a Poisson distribution
11. Tomatoes from a particular nursery are packed a particular week there are seven kettles in stock.
with parameter 30. Using the normal
(a) P(X.; 25), approximation to the Poisson distribution, find in boxes and sent to a market. (b) Find the probability that the shop will not
(b) P(22<;X06), Assuming that the number of bad tomatoes in a be able to meet all the demands for kettles
the probability that, in one hour,
(c) P(X > 23). box has a Poisson distribution with mean 0.44, that week, assuming that it is not possible to
{a) there are more than 33 calls, find, to three significant figures, the probability restock during the week.
2. If X~ Po(35), use the normal approximation to {b) there are between 25 and 28 calls (inclusive), of there being
find (c) there are 34 calls. In order to increase sales performance, the
(a) fewer than two, manager decides to have in stock at the
(a) P(X <; 33), 5. In a certain factory the number of accidents (b) more than two bad tomatoes in a box beginning of each week sufficient kettles to have
(b) P(33 <X <37), occurring in a month follows a Poisson when it is opened. at least a 99% chance of being able to meet all
(c) P(X > 37), distribution with mean 4. Find the probability that Use a normal approximation to find, to three demands during that week.
(d) P(X ~ 37). there will be at least 40 accidents during one year· decimal places, the probability that in 50 (c) Find the smallest number of kettles that
3. If X~ Po(60), use the normal approximation to randomly chosen boxes there will be fewer than should be in stock at the beginning of each
6. The number of bacteria on a plate viewed und.er
find 20 bad tomatoes in total. (L) week.
a microscope follows a Poisson distribution With
(a) P(50 <X 08), a parameter 60. Find the probability that there 12. A large silo is filled with grain harvested by a (d) Using a suitable approximation find the
(b) P(57 <;X< 68), are between 55 and 75 bacteria on a plate. farmer. The grain is contaminated with insect probability that the shop sells at least 18
(c) P(X >52), A plate is rejected if less than 38 bacteria are pests called weevils. The farmer finds that there kettles in a four-week period, subject to stock
(d) P(X;. 70). found. If 2000 such plates are viewed, how many are on average three weevils per litre of grain. always being available to meet demand. (L)
will be rejected?
r
Miscellaneous worked examples
mmary
Example 7.19
"' Normal variable X
A product is sold in packets whose masses are normally distributed with a mean of 1.42 kg
X-N(u,a 2 ) E(X)=fl, Var(X)=a 2 • and a standard deviation of 0.025 kg.
© Standard normal variable Z
(a) Find the probability that the mass of a packet, selected at random, lies between 1.37 kg
Z- N(O, 1) E(Z) = 0, Var(Z) = 1. and 1.45 kg.
X-fl (b) Estimate the number of packets, in an output of 5000, whose mass is less than 1.35 kg. (C)
To standardise X, use Z=--.
a
Solution 7.19
e Using the standard normal tables:
X is the mass, in kilograms, of a packet.
X- N (1.42, 0.025 2 )
P(l <a)= W(a) (a) P(l.37 <X< 1.45)
X-1.42
Standardising, using Z
0.025 Probability
required
0 a 1.37 -1.42
p ( 0.025 <z <
1.45- 1.42)
0.025
'
li I
I
=P(-2<Z<1.2) I
I
P(Z >-a)= lfl(al P(l<--a)=l-ID(a) = 4>(1.2) + <I>(2)- 1 I
= 0.8849 + 0.9772- 1 x. 1.37 1.42 1.45
= 0.8621 z. -2 0 1.2

-a 0 -a 0 The probability that the mass lies between 1.37 kg and 1.45 kg is 0.86 (2 s.f.).

(b) P(X < 1.35) =P(z < 1.3:~2~ 42 )

A 0 a b
P(a < Z <b)= Cfl(b)- <P{a)

" Using the tables in reverse:


-a 0 b
P(-a < Z <b) = ID(a) + t"P(b)- 1
-a
P(jzj<a)
0
~
a
2,.(a)-1
=P(Z < -2.8)
= 1- 4>(2.8)
= 1-0.9974
= 0.0026
Since there are 5000 packets, multiply the probability by 5000.
x.
z.
1.35
-2.8
1.42
0

5QQQ X 0.0026 = 13
If <I>(a) = k, i.e. P(Z <a)= k, then a= q,-'(k)
13 packets have a mass less than 1.35 kg.
The normal approximation to the binomial distribution.
If X- B(n, p) and np > 5, nq > 5 then X- N(np, npq).
The normal approximation to the Poisson distribution Example 7.20
If X- Po(A) and A> 15 then X- N(A, A). In a certain cross country running competition the times that each of the 136 runners took to
" A Poisson approximation to the binomial distribution complete the course were recorded to the nearest minute. The winner completed the course in
If X- B(n, p) and n is large (n >50) and pis small (p < 0.1) then X- Po(np). 23 minutes and the final runner came in with a time of 78 minutes. The full results are
summarised in the table below.
Continuity corrections
These must be used when using a continuous distribution (e.g. normal) as an Recorded time 20-29 30-39 40-49 50-59 60-69 70-79
approximation to a discrete distribution (e.g. binomial, Poisson) Frequency 7 21 42 37 20 9
(a) Use linear interpolation to estimate the median time. (d) Assuming Q 1 = 40.9 and Q 3 = 58.1 50%

fl = ~ (40.9 + 58.1)
The upper and lower quartiles of the time taken are 58.1 and 40.9 respectively. =49.5
25% 25%
r draw a box and whisker plot for the results from this competition. You If X is the recorded time, in minutes
(b) 0 n grap h pape ' . 1 1 diagram
should mark the end points, tbe median and the quart!1es c ear y on your ·
X- N(49.5, o- 2 ) and P(X < 58.1) = 0.75 X: 40.9 49.5 58.1
(c) Comment on the skewness. Z: 0 z

Assume that the time taken by the runners to complete the course follows a normal The central 50% of tbe distribution is enclosed between Q1 and Q3 , so the z value for Q 3
corresponds to an upper tail probability of 0.25, i.e. a lower tail probability of 0.75.
distribution with the values for the quartiles as given above.
58.1-49.5 8.6
(d) Calculate the mean of this normal distribution. . . . .. P(Z < z) = 0.75 where z
(L) a a
(e) Calculate the standard deviation of this normal dtstnbutlOn. <!>(z) = 0.75
z = <1>- 1(0. 75)
= 0.674
Solution,7·~·:2_:o_ _ _ _ _ _ _ _ _ _--:-:--::---.;:-;---::-;~---:::-,;-~-~;s'
~
Recorded time
<29.5 <39.5 <49.5 <59.5 <69.5 <79.5 ~=0.674
(J

127 136 8.6


Cumulative frequency 7 28 70 107
(J = 0.674
. Q is the the lntb
{a) For groupe d d ata, t h e me d 1an, 2' 2
value i.e. Q 2 is the 68th value. This lies
' = 12.75 ...
in the interval39.5- 49.5. The standard deviation is 12.8 minutes (1 d.p.).
This interval has 42 items in it and is of width 10.
Q, 49.5
39.5
Example 7.21
28 Machine A, used for filling bags with ground coffee, can be set to dispense any required mean
weight of coffee per bag. At any setting the weight of coffee in a bag can be modelled by a
normal distribution with a standard deviation of 1.95 g.
(a) If the machine is set to dispense a mean weight of 128 g of coffee per bag, calculate the
percentage of bags that contain less than 125 g.
70
(b) To meet an official regulation the setting on a machine must be adjusted so that no more
Q 2 ~ 39.5 +~X 10 = 49.0 (1 d.p.) than 1% of bags contain less than 125 g.
The median time is 49.0 minutes. (i) Calculate the smallest mean weight to which machine A should be set to meet the
regulation.
(b) (ii) Machine B will only just meet the regulation when it is set to dispense a mean weight
of 128.5 g. Assuming that the weight of coffee is a bag filled by Machine B can be
modelled by a normal distribution, calculate the standard deviation of this
50 60 70 80 distribution. (NEAB)
20 30 40
Time (min)
Solution 7.21
(c) The distribution appears to be symmetrical
X is the weight, in grams of coffee in a bag from Machine A.
Q,- Q2 = 58.1-49.0 = 9.1 X- N(fl, 1.95 2 ).
Q2- Q, = 49.0-40.9 = 8.1 . (a) fl = 128, so X- N(128, 1.95 2 ).
So Q3 - Q2 > Q2 - Q1 , but only JUSt. 125-128)
P(X < 125) = P Z < -:-::-::--
The distribution has a slight positive skew. ( 1.95
= P(Z < -1.538)
= 1- <1>(1.538)
= 1-0.9380
= 0.062 X: 125 128
=6.2% z, -1.538 0

6.2% of bags contain less than 125 g.


(b) X- N(/", 1.95 2 ) and P(X < 125).; 0.01
Solution 7.22

(i) Standardising, you need to find z such that X is the number of drawn matches in 12.
X- B(12, 0.2) since P(draw) = t = 0.2
P(Z < z) = O.Gl 12
(a) (i) P(X = 3) = C3 (0.8) 9 (0.2) 3
1.e. <!>(z) = O.Ql
so <!>( -z) = 0.99 = 0.24 (2 s.f.)
-z = <l>- 1(0.99)
(ii) P(X>4)=1-P(X<4)
= 2.326 125 I'
0
z = -2.326 ' = 1 - ((0.8)' 2 + 12(0.8) 11 (0.2) + 12
C2 (0.8) 10 (0.2) 2 + 12 C (0.8) '(0.2)')
3
125-'" = 1-0.794 ...
---c--c-=:....;- 2.326
1.95
125- /" <:;- 2.326 X 1.95 = 0.21 (2 s.f.).
/" ';> 125 + 2.326 X 1.95 (b) X is the number of drawn matches in 90.
''" 129.53 ...
Then X- B(n, p) with n = 90, p = 0.2, q = 0.8
The smallest mean weight is 129.5 g (1 d.p.).
Now np = 90 x 0.2 = 18, nq = 90 x 0.8 = 72
(ii) Y is the weight, in grams of coffee in a bag from machine B.
Since np > 5, nq > 5, use a normal approximation
Y- N(128.5, a 2 ) and P(Y < 125) = 0.01
X- N(np, npq) with np = 18, and npq = 90 x 0.2 x 0.8 = 14.4,
Standardising:
125-128.5 so X- N(18, 14.4).
P(Z < z) = 0.01 where Z
a P(13:;:;;: X:;:;;: 20) ~ P(12.5 <X< 20.5) :mt:rtt1Ut y c:orr:c._-ti()t!:t I
-3.5
a
x.
z.
125
-2.326
128.5
0 = P(
12.5 -18 20.5 -18)
<Z<-r==-
,\
I
.Y14.4 .Y14.4 I
From part (i) z = -2.326 I
=P(-1.449 < Z < 0.659) I
-3.5 I
.. -2.326 = -
a = <!>(1.449) + <!>(0.659)- 1 x. 12.5 18 20.5
3.5 z. -1.449 0 0.659
a=-- = 0.9264 + 0.7451 -1
2.326
= 1.504 ... = 0.67 (2 s.f.)

The standard deviation is 1.5 g (1 d.p.). (c) D is the number of drawn matches.
D- B(20, 0.2) np = 20 x 0.2 = 4, so np < 5.
H is the number of home wins.
Example 7.22
H- B(20, 0.5) np = 20 x 0.5 = 10 > 5, nq = 10 > 5.
It is estimated that, on average, one match in five in the Football League is drawn, and that
one match in two is a home win. For H, np > 5 and nq > 5, soH can be better approximated by a normal variable.

(a) Twelve matches are selected at random. Calculate the probability that the number of
drawn matches is
(i) exactly three,
(ii) at least four.
(b) Ninety matches are selected at random. Use a suitable approximation to calculate the
probability that between 13 and 20 (inclusive) of the matches are drawn.
(c) Twenty matches are selected at random. The random variables D and Hare the numbers
of drawn matches and home wins, respectively, in these matches. State, with a reason,
which of D and H can be better approximated by a normal variable. (C)
8. A machine is used to fill tubes, of nominal 10. In 1994 an insurance company received claims
Miscellaneous exercise 7h content 100 ml, with toothpaste. The amount of from 20% of the motorists it had insured.
t~ot~paste delivered by the machine is normally
1. Squash balls, dropped onto a concrete floor from 5. Alan is a member of an athletics club. In long dtstnbuted and may be set to any required mean (a) For a random sample of 14 motorists
a given point, rebound to heights which can be jump competitions, his jumps are normally value. Immediately after the machine has been insured with the company in 1994, find the
·modelled by a normal distribution with mean distributed with a mean of 7.6 m and a standard overhauled, the standard deviation of the probability that
0.8 m and standard deviation 0.2 m. The balls deviation of 0.16 m. amount delivered is 2 ml. As time passes this (i) exactly three claimed on their
are classified by height of rebound, in order of (a) Calculate the probability of him jumping standard deviation increases until the mdchine is insurance,
decreasing height, into these categories: Fast, (i) more than 8.0 m, again overhauled. The following three conditions (ii) between two and five inclusive claimed
Medium, Slow, Super-Slow and Rejected. (ii) between 7.50 m and 7.75 m. are necessar~ for a batch of tubes of toothpaste on their insurance
(b) Determine the distance exceeded by 75% of to comply with current legislation: (iii) a majority claimed on their insurance.
(a) Balls which rebound to heights between (b) For a random sample of 90 motorists
0.65 m and 0.9 mare classified as Slow. his jumps. I the average content of the tubes must be at insured ~ith the company, use an
Calculate the percentage of balls classified as Brian also belongs to the athletics club. In long least 100 ml,
approp_nate approximating distribution to
Slow. jump competitions, his jumps are normally II not more than 2.5% of the tubes may
det.ermme the probability that at least 25
(b) Given that 9% of balls are classified as distributed with a mean of 7.4 5 m and 9 5.2% of contain less than 95.5 ml,
claimed on their insurance in 1994. (NEAB)
Rejected, calculate the maximum height of them exceed 7.0 m. III not more than 0.1% of the tubes may
rebound of these balls. (c) Calculate, correct to two decimal places, the contain less than 91 ml. 11. A horticulturalist knows from experience that
(c) The percentage of balls classified as Fast and standard deviation of Brian's jumps. (<!>-! (0.999) ~ 3.09)
when taking cuttings from bay trees only 15 in
as Medium are equal. Calculate the (a) For a batch of tubes with mean content every 100 successfully take root.
The athletics club has to select either Alan or
minimum height of rebound of a ball 98.8 ml and standard deviation 2 ml find
Brian to be its long jump competitor at a major (a) In a batch of ten randomly selected cuttings
classified as Fast, giving your answer correct
athletics meeting. In order to qualify for the final the proportion of tubes which contai~ find the probability that '
to two decimal places. (C) (i) less than 95.5 ml,
rounds of jumps at the meeting, it is necessary to {~! none of the cuttings take root,
achieve a jump of at least 8.0 min the (ii) less than 91 ml.
2. The mass of grapes sold per day in a (11) fewer than three of the cuttings take
supermarket can be modelled by a normal preliminary rounds. Hence state which, if any, of the three root.
distribution. It is found that, over a long period, (d) State, with justification, which of the two conditions above are not satisfied. {b) Let. n be the smallest number of cuttings
the mean mass sold per day is 35.0 kg, and that, athletes should be selected. (NEAR) (b) If the standard deviation is 5 ml find the which need to be examined before there is at
on average, less than 15.0 kg are sold on one day ~ean in each of the following c~ses: least a 95% chance that one or more of
in twenty. 6. The time required to complete a certain car (I) exactly 2.5% of tubes contain less than them will have taken root.
journey has been found from experience to have 95.5 ml, {i) Show that n satisfies (0.85)n..;;;; 0.05.
(a) Show that the standard deviation of the mean 2 hours 20 minutes and standard deviation (ii) Given that (0.85) 17 ~ 0.0631, find the
(ii) exactly 0.1% of tubes contain less than
mass of grapes sold per day is 12.2 kg, 15 minutes. value of n.
91 mi.
correct to three significant figures. (c) Using a suitable approximation estimate the
(b) Calculate the probability that, on a day (a) Use a normal model to calculate the Hence state the smallest value of the mean
probability that, on one day chosen at probability that fewer than six in a batch of
chosen at random, more than 53.0 kg are which would enable all three conditions to
random, the journey requires between 50 cuttings take root. (L)
sold. be met when the standard deviation is 5 ml.
(c) Ten days are chosen at random. Assuming 1 hour 50 minutes and 2 hours 40 minutes.
(c) Currently exactly 0.1% of tubes contain less 12. A large bag of seeds contains three varieties in
independence, find the probability that less (b) It is known that delays occur rarely on this
than 91 ml and exactly 2.5% contain less the ratios 4 : 2 : 1 and their germination rates are
than 15.0 kg will be sold on exactly two of journey, but that when they do occur they than 95.5 mi. 50%, 60% and 80% respectively.
these days. (C) are lengthy. Give a reason why this (i) Find the current values of the mean and Show that the probability that a seed chosen at
information suggests that a normal the standard deviation. r~ndom from the bag will germinate is~.
3. (a) Give two reasons why the normal distribution might not be a good model. (C) (ii) State, giving a reason, whether you Fmd, to three decimal places, the probability that
distribution is important in statistics. would recommend that the machine is of four seeds cho'sen at random from the bag,
(b) An airline has a regular flight from one 7. A machine is producing a type of circular gasket.
The specifications for the use of these gaskets in overhauled immediately. (AEB) exactly two of them will germinate.
airport to another. The airline models the Given that 150 seeds are chosen at random from
duration of a flight as a normally distributed the manufacture of a certain make of engine are 9. A wholesaler buys cauliflowers from a farmer for
that the thickness should lie between 5.45 mm the bag, estimate, to three decimal places the
random variable with a mean of 246
and 5.55 mm, and the diameter should lie
distribution to retail greengrocers. prob~bility that fewer than 90 of them will
minutes and a standard deviation of five The wholesaler classifies the lightest 15% of germmate. (L)
minutes. Use this model to calculate, to one between 8.45 mm and 8.54 mm. The machine is
cauliflowers as small, the heaviest 25% as large
decimal place, the percentage of these flights producing the gaskets so that their thicknesses and the rest as medium. '
arc N(5.5, 0.0004), that is, normally distributed 13. A building society announces its intention to
that are completed in less than four hours. 2 (a) Given that the wholesaler makes a profit of convert to a bank. During the first day following
(NEAB) with mean 5.5 mm and variance 0.0004 mm ,
and their diameters arc independently distributed 2 pence on each small cauliflower, 12 pence th: announcement, the number of calls per
on each medium one and 27 pence on each mmute answered by the society's hotline may be
4. The random variable X is normally distributed N(8.54, 0.0025).
Calculate, to one decimal place, the percentage large one, calculate the wholesaler's mean modelled satisfactorily by a Poisson distribution
with mean p and variance a 2 • profit per cauliflower. with mean 12.
of gaskets produced which will not meet
Given that P(X > .\8.37) ~ 0.02 The weights of the cauliflowers can be modelled (a) Calculate the probability that the hotline
(a) the specified thickness limits,
and P(X < 40.85) ~ 0.01 (b) the specified diameter limits, by a normal distribution with a mean of 628 g answers more than ten calls in a one-minute
and a standard deviation of 160 g. period.
find ll and a. (L) (c) the specifications.
(b) Calculate the weight that a cauliflower must (b) Estimate the probability that the hotline
Find, to three decimal places, the probability answers fewer than 700 calls in one hour.
exceed to be classified as large.
that, if six gaskets made by the machine are (NEAB)
(c) Calculate the weight that a cauliflower must
chosen at random, exactly five of them will meet
fall below to be classified as small. (NEAB)
the specifications. (L)
14. (a) A trade union asked 300 of its members Assuming that the range of rounds is normally 19. An ol~ car is never garaged at night. On the
distributed, find the mean and standard (b) A wholesaler buys 500 randomly chosen
whether they were full-time workers or mornmg following a wet night, the probability
deviation of the range. Flashpan batteries. Using a suitable
part-time workers, and the number of hours 0that the car does not start is 1,.
Estimate the number of rounds falling within approximation, find the probability that at
they worked in a particular week. The table .n the .n:tor~i~g following. a dry night, this most three have lives each less than one
below shows an analysis of this survey. 5 m of the centre of the target. (C)
pwbabihty IS TI· The startmg performance of the year.
car each morning is independent of its (c) A retailer buys ten randomly chosen
Standard 17. A traffic survey is being undertaken on a main
Mean performance on previous mornings.
road to determine whether or not a pedestrian Flashpan batteries .. Find the probability that
Number number of deviation of crossing should be installed. On five successive (a) There a~e six consecutive wet nights. at least four have hves each exceeding two
workers hours worked hours worked days, from Monday to Friday, the hour between Determme the probability that the car does years. (C)
8 a.m. and 9 a.m. was split up into 30-second not ~tart on at least two of the six mornings.
Full-time 100 40 4.5 intervals, and the number of vehicles passing a (b) ~unng a wet autumn there are 32 wet 21. ~escri~e, b.rie~y, the conditions under which the
20 6.9 certain point in each of these intervals was mghts ..Using a suitable approximation bmom1~l distnbution Bin(n, p) may be
Part-time 200
recorded. detenmne the probability that the car does approximated by
The hours, both for the full-time workers The random variable X represents the number of not start on fewer than 16 of the 32 (a) a normal distribution
and for the part-time workers, are normally cars travelling from the town centre per mornings. (b) a Poisson distribution'
30-second interval. For the 600 observations (c) During a long summer drought there are '
distributed. g~vi~g th.e parameters of each of the approximate
(i) Calculate the total number of workers the mean and variance were 3.1 and 3.2 7 100 dry nights. Using a Poisson
approximation, determine the probability d1stnbut10ns.
who worked more than 32 hours. respectively.
that the car does not start on five or more of Am~ng the blood cells of a certain animal
(ii) Given that only 6% of the full-time (a) Explain why X might be modelled by a sp~Cles, the proportion of cells which are of type
workers worked for less than T 1 hours, the 100 mornings.
Poisson distribution. A IS 0.~7 and the proportion of cells which are of
calculate T1• (b) Using the sample mean as an estimate for (Give three decimal places in your answers.) (C) type B ~s. 0.004. Find, to three decimal p"taces, the
(iii) Given that only 3% of the part-time the Poisson parameter, calculate the probability that in a random sample of eight
workers worked for more than T2 probability of recording exactly three 20. The life, in years, of a randomly chosen Flashpan b!ood cells at least two will be of type A.
hours, calculate T2 • vehicles travelling from the town centre in a car battery ts normally distributed with mean 2 Fmd, to three decimal places, an approximate
(b) A set of numbers is normally distributed; 30-second interval. and standard deviation 0.4. value for the probability that
1.5% of the numbers exceed 1434 and (c) Calculate the probability of recording at Show that the probability that a randomly chosen
16.6% of the numbers exceed 1194. least six vehicles travelling from the town Flashpan battery has a life less than one year is (c) in a random sample of 200 blood cells the
Calculate the mean and the standard centre in a 60-second interval. 0.006 21, correct to five places of decimals. ~ombined number of type A and type B cells
deviation of the distribution. (C) IS 81 or more,
The mean number of vehicles per 30-second (a) A far~1er b~tys two randomly chosen Flashpan (d) there will be four or more cells of type Bin
interval passing the survey point travelling batter~es. Fmd the probability that the a random sample of 300 blood cells. (L)
15. During an advertising campaign, the
manufacturer of Wolfitt (a dog food) claimed towards the town centre during the same survey battenes each have a life more than one year.
that 60% of dog owners preferred to buy period was 7.9.
Wolfitt. Assuming that the manufacturer's claim (d) Show that there is roughly a 12% chance
is correct for the population of dog owners, that the total number of vehicles passing per Mixed test 7A
calculate 30-second interval is ten.
(a) using the binomial distribution, and (e) Using a suitable approximation, estimate the 1. A smoker's blood nicotine level, measured in (b) Find the probability that in a random
(b) using a normal approximation to the probability of between 16 and 24 vehicles ng/~n!, may be modelled by a normal random s~mple of 20 students fewer than 15 will be
(inclusive) passing the survey point in a vanable with mean 310 and standard deviation
binomial; nght-handed.
60-second interval. (MEl) 110.
the probability that at least six of a random (c) Determ.ine, to two decimal places, an
sample of eight dog owners prefer to buy (a) What proportion of smokers have blood ~pprox1mate value for the probability that
18. [In this question give three places of decimals in
Wolfitt. nicotine levels lower than 250? m a random sample of 200 students at most
each answer.]
Comment on the agreement, or disagreement, When a telephone call is made in the country of (b) What blood nicotine level is exceeded by 184 will be right-handed. (NEAB)
between your two values. Would the agreement Japonica, the probability of getting the intended 20% of smokers? (AEB)
be better or worse if the proportion had been 4. The random variable X represents the weight in
number is 0.95. grams, of chocolate chips in packets sold by;
80% instead of 60%? 2. The number of hours of sunshine at a resort has
Continuing to assume that the manufacturer's (a) Ten independent calls are made. Find the been reco~ded for each month for many years. supermarket. It is suggested that X can be
figure of 60% is correct, use the normal probability of getting eight or more of the One year ts selected at random and His the modelled by a normal distribution with
approximation to the binomial to estimate the intended numbers. Find also the conditional number of hours of sunshine in August of that X- N(lOO, 25).
probability that, of a random sample of 100 dog probability of getting all ten intended ye.ar. H can be modelled by a normal variable (a) Find P(X> 108).
numbers given that at least eight of the With mean 130.
owners, the number preferring Wolfitt is between (b) Show dut P(l X -100 I< 6.8) ~ 0.8262.
60 and 70 inclusive. (MEl) intended numbers are obtained.
(b) Three hundred independent calls are made. (a) Given that P(H < 179) = 0.975, calculate the Three packets are selected at random from the
Find the probability of failing to get the standard deviation of H packets of chocolate chips on the supermarket
16. Six hundred rounds are fired from a gun at a (b) Calculate P(100 < H <ISO).
horizontal target 50 m long which extends from intended number on a least ten but not more (C) shell.
950 m to 1000 min range from the gun. than twenty of the calls. 3 · I~ a large university 90% of the students are (c) Find th: probability that exactly two of
The trajectories of the rounds all lie in the (c) Four hundred independent calls are made.
nght-handed. them will have weights in the range
For each call the probability of getting
vertical plane through the gun and the target. It
'number unobtainable' is 0.004. Find the (a) Show that the probability that in a random
I X -100 I< 6.8.
is found that 27 rounds fall short of the target (d) C:om.me~t on the suitability of the normal
probability of getting 'number unobtainable' s~mple of eight students exactly six will be
and 69 rounds fall beyond it. dtstnbutwn as a model for X. (L)
fewer than three times. (C) nght-handed is approximately 0.149.
Mixed test 78
1. The area that can be painted using one litre of 4. Frugal Bakeries claim that pac!<~ of ten o~ their
Luxibrite paint is normally distributed with buns contain on average 75 rmsms. A P01sson
mean 13.2 m 2 and standard deviation 0.197 m .
2 distribution is used to model the number of
The correspondi~g figures for o ne o.f Max~gloss raisins in a randomly selected bun.
paint are 13.4 m and 0 34:' m . I~ ts reqmre~ to
2
(a) Specify the value of the parameter.
paint an area of 12.9 m 2. Fmd w_htch ~amt gtves (b) State any assumption required about. the
the greater probability that one htre wtll be distribution of raisins in the productiOn
sufficient, and obtain this probability. (C) process for this model to be valid.
(c) Show that the probabi~ity that a r~ndom~y.
2. Soup is sold in tins which are filled by a ~achine. selected bun contains more than etght ralSlns
The actual weight of soup deltvered to a tm by
the filling machine is always normally distri.bu.ted
is 0.338.
(d) Find the probability that in a pack of ten
Linear combinations of normal variables
about the mean weight with a standard de.vtatiOn buns at least two buns contain more than
of 8 g. The machine is set originally to dehver a eight raisins.
mean weight of 810 g. (e) Using a suitable approximation, find the
(a) Determine the probability that the weight of probability that in a pack of ten buns there In this chapter you will learn about the distributions for
soup in a tin, selected at random, is less than are more than 80 raisins. (L)
800 g. . ® the sum of independent normal variables
(b) Determine the probability that the we1ght of 5. An engineering firm sets an aptitude tes~ when
soup in a tin, selected at random, is between applicants first apply for training. The Urnes ® the difference of independent normal variables
795 g and 820 g. taken to complete the test are normally
distributed with mean 40.5 minutes and standard multiples of independent normal variables
Proposed legislation requires that not more ~han deviation 7.5 minutes. Applicants who complete
2.5% of tins may contain less than the nommal the test in less than 30 minutes are immediately
net weight of 800 g. accepted for training. Thos~ who take between You will need the following results, first introduced on pages 256 and 257.
(c) Assuming that the value of the standa.rd 30 and 36 minutes are reqmred to take a further
deviation remains unchanged, determme the test. All other applicants are rejected. If X and Yare any two random variables, discrete or continuous, and a and b are any two
minimum mean weight that the machine (a) For a randomly chosen applicant calculate constants,
should be set to deliver in order to comply the probability of . .
with this requirement. (NEAB) (i) immediate acceptance for trammg, Sums Differences
(ii) requirement to take a further t~st.
3. Consultants employed by a large library reported (b) Given that a randomly chosen applicant was
E(X + Y) ~ E(X) + E(Y .•• <D E(X- Y) ~ E(X)- E(Y) •.• @
that the time spent in the library by a user could not rejected after this first test, calculate, to E(aX +bY)~ aE(X) + bE(Y) ® E(aX- bY)~ aE(X) - bE(Y) ... ®
be modelled by a normal distribution with mean three decimal places, the probability that the
65 minutes and standard deviation 20 minutes. applicant was immediately accepted for Also, if X and Yare independent, then
(a) Assuming that this model is adequate, what training.
(c) On a certain occasion there were 100
Var(X + Y) ~ Var(X) + Var(Y) ... ® Var(X- Y) ~ Var(X) + Var(Y) ... ®
is the probability that a user spends
(i) less than 90 minutes i-?- the li.brary, applicants. Use a suitable distributiona~. Var(aX +bY)~ a 2 Var(X) + b 2 Var(Y) ... ® Var(aX- bY)~ a 2 Var(X) + b 2 Var(Y) ... ®
(ii) between 60 and 90 mmutes m the approximation to calculate the probabt~tty
library? that more than 25 applicants were reqmred
to take a further test. (NEAB)
The library closes at 9.00 p.m.
(b) Explain why the model above could not
apply to a user who entered the library at THE SUM OF INDEPENDENT NORMAL VARIABLES
8.00 p.m. .
(c) Estimate an approximate latest time of entry Consider this example which involves the sum of independent normal variables.
for which the model above could still be
plausible. (AEB)
Example 8.1

A coffee machine is installed in a students' com1non room. It dispenses white coffee by first
releasing a quantity of black coffee, normally distributed with mean 122.5 ml and standard
deviation 7.5 ml, and then adding a quantity of milk, normally distributed with mean 30 rnl
and standard deviation 5 ml.
Each cup is marked to a level of 137.5 ml and if this level is not attained the customer receives
the drink free of charge.
What percentage of cups of white coffee will be given free of charge?
Solution 8.2
Solution 8.1
2
B is the amount, in millilitres, of black coffee, where B- N(122.5, 7.5 ). Let T be the total time, in seconds, for the relay race.
2
M is the amount, in millilitres, of milk, where M- N(30, 5 ). Then T =A+ B + C + D
B and M are independent normal variables. E(T) = E(A) + E(B) + E(C) + E(D) ! i

Consider W, the amount, in millilitres, of white coffee, made by combining the black coffee = 10.8 + 23.7 + 62.8 + 121.2
= 218.5
and milk, so W = B + M and
(u,:ill['. l<c:sttlt i abnVt''i Var(T) = Var(A) + Var(B) + Var(C) + Var(D)
E(W) = E(B) + E(M) = 122.5 + 30 = 152.5
2 2 (\Loin!.!, Rc:scdt· .'\above) = 0.2 2 + 0.3 2 + 0.9 2 + 2.1 2
Var(W) = Var(B) + Var(M) = 7.5 +5 = 81.25 =5.35
SoW= B + M has a mean of 152.5 and a variance of 81.25. .. T- N(218.5, 5.35)
For independent normal variables, it is true that the snm of these variables is also normally
To find the probabi'(lity that the total time is less than 3 minutes 35 seconds, i.e. 215 seconds,
distributed, so
B + M- N(152.5, 81.25)
find P(T < 215) = p z < 215- 218.5)
'i5.35
i.e. W- N(152.5, 81.25)
= P(Z < -1.513)
The drink is free of charge if W < 137.5 = 1 - <!>(1.513)
137.5- 152.5) = 1-0.9349
P(W<137.5)=PZ< -~ = 0.0651 I
( ~81.25 T: 2.15 218.5
Z: -1.513 0
= P(Z < -1.664)
~~~~e probability that the runners take less than 3 minutes 35 seconds is 0.065 (2 s.f.).
= 1 - <!>( 1.664)
= 1-0.9519
= 0.0481
w,
z,
137.5 152.5
-1.664 0 Consider nolwd:he 'sbpe~ial
case when xl, x2, ... , XII :r~-n indep~=~obs~:rva~~:~~;~~ ~~:==·
same norma 1stn uuon
=4.81%
So approximately 5% of the cups of white coffee will be given free of charge. so X 1 - N(/-l, a 2 ), X 2 - N(/-l, a 2 ), ... , Xn- N(f-l, a2)
then E(X 1 +X 2 + ... +Xn) = E(X 1) + E(X 2 ) + ... + E(X")
=f-l+f-l+"· +f-l
In general =nf-l
If X.~ Var(X, + Xz + ... + Xn) = Var(X 1 ) + Var(X 2 ) + ... +Var(X )
then X+ Y ·~ = a2 + a2 + ... + a2 n
This result can be uu.cmJcu to any set of ndcpcondcnl normal vm·L11l1C• X1 J =na 2
So I I

1X 1 I· I _1 + ,112
Example 8.3

Example 8.2 2Masses


A ofda particular
1 article
f are normally distributed with mean 20 g and st an d ar d d ev1at10n
· ·
Four runners, Andy, Bob, Chris and Dai, train to take part in a 1600 m relay race in which g. ran om samp e o 12 such articles is chosen. Find the probability that the t t 1 ·
greater than 230 g. o a mass 1s
Andy is to run 100m, Bob 200m, Chris 500 m and Dai 800 m.
During training their individual times, recorded in seconds, follow normal distributions.
Solution 8.3
With obvious notation, these are:
2 2 X is the mass, in grams, of an article.
A- N(10.8, 0.2 2 ), B- N(23.7, 0.3 2 ), C- N(62.8, 0.9 ) and D- N(121.2, 2.1 ).
X- N(20, 2 2 ).
Find the probability that they run the relay race in less than 3 minutes 35 seconds.
So X 1 -N(20,4) E(X 1) =20, Var(X 1) = 4
X 2 - N(20, 4) E(X 2 ) =20, Var(X 2 ) =4

x12- N(2o, 4) E(Xu)=20, Var(Xu)=4


LetT= X 1 + X 2 + · · · +X 12 THE DIFFERENCE OF INDEPENDENT NORMAL VARIABLES
then E(T} = E(X,} + E(X,} + ... + E(Xu}
For two independent variables X andY, where X- N(lt,a 2) andY- N(/1 ,alJ
= 12E(X} 1 2
=240 E(X- Y) = E(X)- E(Y} =11 1 -11 2 Rt·sulr 2, p:1gc' 40:)
Var(T} = Var(X 1} + Var(X 2 } + ··· + Var(X12} Var(X- Y) = Var(X} + Var(Y} = a 12 +az" Rt:sult (J, f'<lgc ,WJ

= 12Var(X} X- Y is normally distributed, so


=48 X y .. ,u 2'
So T- N(240, 48}.
230 -240)
P(T> 230} = P Z > '{48
(
= P(Z > -1.443} Example 8.5
T 230 240
= <1'>(1.443} z: -1.443
0 A machine produces rubber balls whose diameters are normally distributed with mean
= 0.9255 £} 5.50 em and standard deviation 0.08 em.
The probability that the total mass is greater than 230 g is 0.93 (2 s... The balls are packed in cylindrical tubes whose internal diameters are normally distributed
with mean 5.70 em and standard deviation 0.12 em.
If a ball, selected at random, is placed in a tube, selected at random, what is the distribution of
Example 8.4 11 d' 'b t d the clearance? (The clearance is the internal diameter of the tube minus the diameter of the ball.}
. . 450 l The weights of men are norma y tstn u e What is the probability that the clearance is between 0.05 em and 0.25 em?
The maximum load a hft can carry ts. (g. The wei hts of women are normally
with mean 60 kg and standard devlattond10dkg. . 5 kg Find the probability that the lift will
.h 55 k nd standar evtatton g. (L} Solution 8.5
distributed wtt mean g ad .f their weights are independent.
be overloaded by five men an two women, 1
Let B be the diameter, in centimetres, of a rubber ball. Then B- N(5.50, 0.08 2 }
Solution 8.4 , Let T be the internal diameter, in centimetres, of a cylindrical tube. Then T- N(5. 70, 0.12 2}
. l. f man Then M- N(60, 10 }. Let C be the clearance, in centimetres, so C == T- B
Let M be the weight, m n 1ograms, o a . Th W- N(55 5').
Let W be the weight, in kilograms, of a woman. en ' E(C} = E(T)- E(B) = 5.70-5.50 = 0.2
The lift will be overloaded if Var(C} = Var(T} + Var(B} = 0.08 2 + 0.12 2 = 0.0208
~+~+~+~+~+~+~>~0. so C- N(0.2, 0.0208}
Let T=M 1 +M, + ... +Ms + W, + W,
To find the probability that the clearance is between 0.05 em and 0.25 em, find
E(T} = 5E(M} + 2E(W}
0.05- 0.2 0.25- 0.2)
=300+110 P(0.05 < C < 0.25} = P < Z < --;==~
=410 ( '-/0.0208 '-/0.0208
Var(T} = 5 Var(M} + 2 Var(W} = P(-1.040 < Z < 0.347}
= 500 +50 = <1'>(1.040} + <1'>(0.347}- 1
= 550 = 0.8508 + 0.6357- 1
= 0.4865 T: 0.05 0.2 0.25
Since M and Ware normally distributed, Tis also normal. Z: -1.040 0 0.347

So T- N(410, 550}. : The probability that the clearance is between 0.05 em and 0.25 em is 0.49 (2 s.f.}.

P(lift is overloaded}= P(T> 450} 1


450-410) : Example 8.6
= p ( Z > . r;:-;c;; I
v550
1
= P(Z > 1.706} o A certain liquid drug is marketed in bottles containing a nominal20 ml of drug. Tests
= 1- <1'>(1.706} i', 4 0
t 45
1.706 on a large number of bottles indicate that the volume of liquid in each bottle is
distributed normally with mean 20.42 ml and standard deviation 0.429 mi.
= 0.0441 . 044 (2 sf}
If the capacity of the bottles is normally distributed with mean 21.77 ml and standard
The probability that the lift will be over1oad e db. y five men and two women ts 0. ·~·--~- ..
deviation 0.210 ml, estimate what percentage of bottles will overflow during filling.
T ··; 409

Solution 8.6
1. X and Yare independent normal variables with 6. The mass, in grams, of a Chocolate Delight cake
X is the volume, in millilitres, of liquid and X- N(20.42, O.i;"i;6,) X- N(lOO, 49) andY- N(llO, 576).
y is the capacity, in millilitres, of a bottle andY- N(21.77, · · . is normally distributed with mean 20 g and
standard deviation 2 g. The cakes are sold in
The bottle will overflow if the quantity of liquid is greater than the capacity of the batt1e,
(a) Find the mean and the standard deviation of
the distribution X+ Y. packets of six and the mass of the packing
(b) Describe the distribution of X+ Y. material is normally distributed with a mean of
i.e. if X> Y so X- Y > 0 30 g and a standard deviation of 4 g.
(c) Find P(X + Y> 200).
LetD=X-Y (d) Find P(180 <X+ Y < 240). (a} Find the probability that the mass of
E(D) = E(X)- E(Y) = 20.42-21.77 = -1.35 six cakes is less than 110 g.
2. Each weekday Mr Harper goes to the local (b) Find the probability that the total mass of a
Var(D) = Var(X) + Var(Y) = 0.429' + 0.210' = 0.2281 library to read the newspapers. The time he packet containing six cakes is
spends travelling is a normal variable with mean (i) more than 162 g,
D- N(-1.35, 0.2281) I
1.5 minutes and standard deviation 2 minutes.
I (ii) less that 137 g,
0- (-1.35)) I The time he spends in the library is normally (iii) between 140 g and 153 g.
P(D > O) = p ( z > "1/0.2281 I
I
distributed with mean 25 minutes and standard
deviation 4 minutes. 7. In a certain village, the heights of the women are
I Find the probability that, on a particular day, Mr
= P(Z > 2.827) normally distributed with a mean of 164 em and
I Harper
= 1- <1>(2.827) ---~~~----+1--~~-~~- a standard deviation of 5 em. The heights of the
-1.35 0 (a) is away from the house for more than men are normally distributed with a mean of
= 1-0.9976 0 2.827 45 minutes, 173 em and a standard deviation of 6 em.
= 0.0024 {b) spends more time travelling than in the A man and a woman are picked at random from
.. library. the people in the village.
0 _24 % of bottles will overflow during filling. Find the probability that
3. Bolts are manufactured which are to fit in holes (a) the woman is taller than the man,
in steel plates. (b) the man is more than 5 em taller than the
Example 8.7 . The diameter of the bolts is normally distributed woman.
In a cafeteria, baked beans are served either in ordlinary bploerwtioi_tnhsmore: :lgd~:~ss~~~~~;;· The
with mean 2.60 em and standard deviation
0.03 em. The diameter of the holes is normally 8. The mass of a certain grade of apple is normally
. . f d· ortion 1s a norma vana distributed with mean of 2. 71 em and standard
quantity given or an or mary p f hild's portion is a normal variable with mean 43 g distributed with mean mass 120 g and standard
deviation 3 g and the quantity given or a c deviation of 0.04 em. deviation 10 g.
and standard deviation 2 g. , · · · than his (a) Verify that, if a bolt and a hole are selected (a) An apple of this grade is selected at random.
What is the probability that Tom, who has two children s portwns, IS given more at random, the probability that the bolt is Find the probability that its mass lies
. ) too large to enter the hole is 0.0139. between 100.5 g and 124 g.
father, who has an ordinary portwn.
{b) The random selection of a bolt and hole is (b) Four apples of this grade are selected at
carried out five times. Find the probability random. Find the probability that their total
Solution 8. 7 that in every case the bolt will be able to mass exceeds 505 g.
enter the hole. (C)
. th
C IS ti"ty I· grams in a child's portion. Then C- N(43, 4)
e quan , 11 ' . h A N(90 9) 9. ROds are produced in two lengths, called 'short'
A is the quantity' in grams, in an ordinary portion. T en ~ ' 4. The mass of a particular article follows a normal and 'long'.
distribution with mean 20 g and variance 4 g2 • A S is the length, in centimetres, of a short rod,
You need to find P(C1 + C, >A), i.e. P(CI + c,- A> O) random sample of 12 items is tested. Find the where S- N(5, 0.25).·
LetW=CI+C2 -A probability that the total mass is less than 230 g. L is the length, in centimetres, of a long rod,
where L- N(lO, 1).
E(W) = E(C1 ) + E(C2 ) - E(A) 5. Fiona, Carly, Jenny and Vicky swim in the Rods are joined to give longer lengths. Find the
= 2E( C) - E(A) 4 x 100 m freestyle relay team, with each one probability that a length consisting of
= 86-90 swimming 100 m. The times in seconds taken by
(a) two short rods and four long rods is longer
each of the girls to swim 100m are independent
=-4 normal variables, distributed as follows:
than 52 em,
Var(W) = Var(CI) + Var(C2 ) + Var(A) (b) three short rods and two long rods is
F- N(52.5, 0.3'), C- N(52.0, 0.6 2), between 33 em and 36 em long,
= 2 Var(C) + Var(A) J- N(53.5, 1.2 2 ). V- N(51.5, 0.6 2 ). {c) six short rods is longer than a length
=8+9 Calculate the probability that in a particular consisting of three long rods.
= 17 race,
So W-N(-4,17) (a) Fiona will swim her leg in less than
52.5 seconds,
0- (-4)) (b) the relay team will take longer than
P(W>O)=P ( Z > f0 3 minutes 31.3 seconds to swim the race,
(c) Carly will swim her leg faster than Vicky.
= P(Z > 0.970)
= 1 - <1>(0.970) w -4 0
0 0.970
= 0.166 z.
The probability that Tom is given more than his father is 0.17 (2 s.f.).
't

Assuming independence of the distribution for


I Now consider two independent normal variables X and y h X N( 2
10. Mr Smith has five dogs, two of which are male each coat, calculate the mean and standard Y - N(p , a}) w ere - p 1 , a 1 ),
and three are female. The masses of food they eat 2
deviation for the total quantity of paint used on
in any given week are normally distributed as
each lorry. For any constants a, b, using the results on page 403
follows: Assuming that the quantities of paint used for
each coat are normally distributed, calculate E(aX +bY)= aE(X) + bE(Y) = af't + bftz
Standard
(a) the percentage of lorries receiving less than E(aX- bY)= aE(X)- bE(Y) = af<t _ b,,, ;::::::: '"
Mean (kg) deviation (kg)
8.5 litres of paint,
(b) the percentage of lorries receiving more than Var(aX +bY)= a' Var(X) + b' Var(Y) =a' a,'+ b'a}
Male 3.5 0.4
10.0 litres of paint. (C) Var(aX- bY)= a' Var(X) + b' Var(Y) =a' a,'+ b'a}
Female 2.5 0.3
13. The values of two types of resistors are normally
Find the probability that the two males eat more distributed as follows:
than the three females in a particular week. Type A: mean: 100 ohms; standard deviation:
aX+ by and aX- by are also normally distributed, so
2ohms
11. The time taken to carry out a standard service on Type B: mean: 50 ohms; standard deviation: aX+ bY_,
a car of type A is known, to a good + b'
1.3 ohms
approximation, to be a normal variable with aX bY b 1 o,
mean 1 hour and standard deviation 10 minutes. (a) What tolerances would be permitted for
Assuming that only one car is serviced at a time, type A if only 0.5% were rejected?
(b) 300-ohm resistors are made by connecting
find the probability that it will take more than
together three of the type A resistors, drawn
Example 8.8
6.5 hours to service six cars.
from the total production. What percentage
The time taken to carry out a standard service on X andy are independent random variables and X- N(100 8) y .
of the 300-ohm resistors may be expected to
a car of type B is a normal variable with mean
have resistances greater than 29 5 ohms?
probability that an observation from the population of Xi' ' -hN(55, 10). Fmd the
1.5 hours and standard deviation 15 minutes.
(c) Pairs of resistors, one of 100 ohms and one
observation from the population of Y. s more t an twtce the value of an
Find the probability that five cars of type B can
be serviced more quickly than eight cars of of 50 ohms, drawn from the total
production for types A and B respectively,
type A. (C)
are connected together to make 150-ohm Solution 8.8
resistors. What percentage of the resulting
12. The process of painting the body-work of a resistors may be expected to have resistances You need to find P(X > 2 Y), i.e. P(X- 2 y > 0).
mass-produced lorry consists of giving it one
coat of paint A, three coats of paint B and two in the range 150 ohms to 1.51.4 ohms? LetD=X-2Y
(AEB)
coats of paint C. A record of the quantity of each E(D) = E(X)- 2E(Y) = 100- 110 = -10
type of paint used for each coat is kept for each
lorry produced over a long period. The following
14. The time of departure of my train from Temple Var(D) = Var(X) + 2 2 Var(Y) = 8 + 4 x 10 = 48
Meads Station is distributed normally about the
table gives the means and standard deviations of SoD- N(-10, 48)
scheduled time of 08:25 with a standard
these quantities measured in litres: deviation of 1 minute. I arrive at Temple Meads

Standard
Station on another train whose time of arrival is
normally distributed about the scheduled time of
P(D > 0) = p(z > 0- (-10))
deviation 08:20 with standard deviation of 1 minute. It ffl
Mean
takes me three minutes to change platforms. = P(Z > 1.443)
3.7 0.42 If I miss the train from Temple Meads, I am late = 1- <I>(1.443)
The coat of paint A
0.15 for work. = 1-0.9255
Each coat of paint B 1.3 0: -10 0
(a) Find the probability that I am late for work.
Each coat of paint C 1.0 0.12
(b) Find the probability that I miss the train
= 0.0745 z, 0 L443
from Temple Meads Station every day from
Monday to Friday in a given week. :e
;:~~::r~:~:::f::: ;~;:~=~~: ~o;i:~~C;I{'~a:~~). of X is more than twice the value of
MULTIPLES OF INDEPENDENT NORMAL VARIABLES
~r:a:a~~~:~:~i~~lte~ken in distinguishing between a sum of random variables and a multiple
Remember that, for any constant a,
2
E(aX) = aE(X) (page 246) and Var(aX) = a Var(X) (page 250)
:~::r:~7:~::e!: X is the weight of a small loaf, then the sum X t + X 2 + X 3 is the total weight
2
If X is a normal variable such that X- N(f<, a )
If X- N(p, a 2 ) then X 1 + X 2 + X 3 - N(3p, 3a').
then E(aX) = aE(X) =aft But · 1oa f w h'tch ts
if there isf a large econolmyf-stze . three times the weight of a small/oaf then
2 2
Var(aX) = a 2 Var(X) = a a the weight
o an economy oa ts 3X (a multiple) '
It can be shown that aX is also normally distributed
and 3X- N(3ft, 9a 2 ).
so aX,~
t

(b) To find the probability that the amount in a large bottle is less than the total amount in
(i) \

four small bottles you need P(L < S1+ S2+ S3+ S4) = P(L _ (S, + s, + s3+ s4) < 0)
111

X
nX 11 o E(L- (S 1 + ··· + S4 )) = E(L)- E(S 1 + ...
= E(L)- 4E(S)
:s,) ····~

Notice that the means are the same but the variances are not.
= 1012- 1008
The distribution for the multiple is more spread out.
= 4 fc./___ -- ----~-------

Look carefully at the following example. Var(L- (S 1 + ··· + S4 )) = Var(L) +Var(S 1 + · ·· + S4 ) fkmunlwdw 1 +ign
= Var(L) + 4 Var(S)
= 25 + 16
Example 8.9
=41
A soft drinks manufacturer sells bottles of drinks in two sizes. The amount in each bottle, in
Therefore L- (S 1 + ··· + S4 ) - N(4, 41)

( 0ffi-4)
Mean (ml) Variance (ml')
P(L- (S 1 + · · · + S4 ) < 0) = P Z < - -
Small 252 4
Large 1012 25 = P(Z < -0.625)
= 1 - <!>(0.625) L- (Sl +···+S 4) 0 4
millilitres, is normally distributed as shown in the table: = 0.266 Z: -0.625 0
(a) A bottle of each size is selected at random. Find the probability that the large bottle
contains less than four times the amount in the small bottle. The probability that a large bottle contains less than four small bottles is 0.27 (2 s.f.).
(b) One large and four small bottles are selected at random. Find the probability that the It is very i1nportant to distinguish between
amount in the large bottle is less than the total amount in the four small bottles.
the multiple of Sin part (a) and
the sum of s,, s,, s3, s. in part (b).
Solution 8.9 Note that E(L- 4S) = 4
E(L- (S 1 + S2 + S3 + S4 )) = 4 } The means are the same.
LetS be the amount, in millilitres, in a small bottle. Then S- N(252, 4).
Let L be the amount, in millilitres, in a large bottle. Then L- N(1012, 25). Var(L- 4S) = 89
(a) To find the probability that the large bottle contains less than four times the amount in a Var(L- (S 1 + S2 + S3 + S4 )) = 41 ) The variances are different.
small bottle, you need P(L < 4S)
i.e. P(L- 4S < 0).
Now E(L- 4S) = E(L)- E(4S) ;ttkh,ck "' s: 8b rvlultiples of normal variables
= E(L)- 4E(S)
1. X and Yare independent normal variables such 3. The thiclmess, P em, of a randomly chosen
= 1012- 1008
that X- N(40, 12) andY- N(60, 15). Find paperback .book may .be regarded as an
=4 observation from a normal distribution with
(a) P(2X + Y> 130)
(b) P(3X-2Y<20) mean 2.0 and variance 0. 730.
Var(L- 4S) = Var(L) Jvar(4S) The thickness, Hem, of a randomly chosen
= Var(L) + 16 Var(S) hardback book may be regarded as an
2. The time taken by Simon to do his Mathematics
= 25 + 64 homework can be modelled by a normal observation from a normal distribution with
= 89 distribution with mean 50 minutes and standard mean 4.9 and variance 1.920.
deviation 10 minutes. The time taken by Belinda (a) Determine the probability that the combined
So L- 4S- N(4, 89) is N(30, 25). thickness of four randomly chosen

( 0m-4)
P(L- 4S < 0) = P X < f\
I \
(a) Find the probability that, for a particular
homework, Simon takes more than twice as
long as Belinda.
paperbacks is greater than the combined
thickness of two randomly chosen
hardbacks.

= P(Z < -0.424)


= 1- <!>(0.424)
i\
I
(b) Find the probability that Belinda spends less
time in total on Monday's homework and
Thursday's homework than Simon spends
(b) By con~idering X= 2P- H, or otherwise,
determme the probability that a randomly
chosen paperback is less than half as thick
on Monday's homework. as a randomly chosen hardback.
= 1-0.6642 0 4 {c) Determine the probability that a randomly
= 0.3358 -0.424 0 chosen collection of 16 paperbacks and 8
hardbacks will have a combined thickness of
The probability that a large bottle contains less than four times the amount in a small less than 70 em.
bottle is 0.34 (2 s.f.). (Give three decimal places in your answers.) (C)
414 P., CONCiS[ COUPSE iN ,A,-LE it:!_ STf\TTSTiCS

6. Next May, an ornithologist intends to trap one


4. The random variable X is distributed normally
male cuckoo and one female cuckoo. The mass Miscellaneous worked examples
with mean fl and variance 6, and the random
variable Y is normally distributed with mean 8 M of the male cuckoo may be regarded as being
a normal random variable with mean 116 g and
and variance a 2 • Example 8.10
2X- 3 Y is distributed normally with mean -12 standard deviation 16 g. The mass F of the
female cuckoo may be regarded as being
and variance 42. Find The distribution of the masses of adult husky dogs may be mod 11 d b h 1
independent of M and as being a normal random d' 'b · ·h e e y t e norma
(a) the value of p and the value of a, variable with mean 106 g and standard deviation tstn ut10n wtt mean 3 7 kg and standard deviation 5 kg.
(b) P(X> 8), 12 g. Determine
(c) P(Y < 9),
(a) the probability that the mass of the two
(a) Calculate the probability that an adult husky has a mass greate th 30 k
(d) P(-4 <3X- 2Y < 7). (b) Calculate the probability that a randomly chosen team 0f SIX
· hrus kanh
birds together will be more than 230 g, · b 1es as ag.total mass
5. A single observation is taken from each of the (b) the probability that the mass of the male 1ymg etween 198 kg and 240 kg, giving your answer to thre e dectma · 1 paces.
1 (NEAB)
will be more than the mass of the female.
distributions
A- N(82, 1.5 2 ), B- N(42, 0.3 2 ) and By considering X= 9M- 'j 6F, or otherwise,
C- N(85, 0.7 2 ) determine the probability that the mass of the Solution 8.10
Find the probability that the mean of these female will be less than nine-sixteenths of that of
observations, j(A + B +C), is greater than 70. the male.
His the mass, in kilograms, of a husky dog. Then H ~ N(37, 52).
Suppose that one of the two trapped birds
escapes. Assuming that the remaining bird will (a) P(H > 30) ~ p( z > 30 ~ 37)
be equally likely to be the male or the female,
determine the probability that its mass will be
~ P(Z > -1.4)
more than 118 g. (C)
~ <1'>(1.4) 30 37
~ 0.9192 z, -1.4 0

The probability that an adult husky dog has a mass greater than 30 kg is 0.919 (3 d.p.).

Summary (b) LetT~H1 +H2 +···+H6


E(T) ~ 6E(H) ~ 6 x 37 ~ 222 and Var(T) ~ 6 Var(H) ~ 6 x 25 ~ 150
" For two independent normal variables such that :. T- N(222, 150)
X~ N(u 1 , al) andY~ N(u2 , a 2
2
~ P(198- 222
)
P(198 < T < 240) < Z < 240- 222)
X+ Y ~ N(f< 1 +f< 2, a,' +a ,I) >/150 >/150
X- Y ~ N(ft 1 - f.l 2, a,"+al) ~ P(-1.960 < Z < 1.470)

10 For n independent normal variables such that X, ~ N(f<;, a/) ~ <1'>(1.960) + <1'>(1.470)- 1
~ 0.9750 + 0.9292-1 T: 198 222 240
x1 + x2 + ... +XII- N(ft1 + f.l-2 + ... + J..t,l, a/+ al + ... +a Ill) ~ 0.9042 Z: -1.96 0 1.47
.. For n independent observations of the random variable X where X~ N(ft, a
2
),
The probability that six huskies have a total mass lying between 198 g and 240 k ·
X 1 + X 2 + ··· + Xn ~ N(nf.l, na )
2
0.904 (3 d.p.). . g IS
., For the normal variable such that X~ N(fr, a 2 ), and for any constant a
aX ~ N(af.l, a 2a 2)
,. For two independent normal variables such that
X- N(f.l 1,a,") andY- N(f.l 2,al) and for any constants a and b
2
aX+ bY- N(af< 1 + bf< 2 , a 2 a," + b a,I)
aX- bY~ N(af.l 1 - bf.l 2 , a2 a 12 + b a,I)
2
416 .6.. CONCISt: COi_JF-;St: i~-i A- U~\,.-'[L_ ST,4TiS IICS
T
I
(c) Y is the lifetime of an Enersaver light bulb and Y ~ N(7900, 502).
Example 8.11
P(Y> 8X) is needed, i.e. P(Y- 8X;;. 0).
The lifetimes of Econ light bulbs are normally distributed with mean 1000 h and standard
E(Y- 8X) = E(Y)- 8E(X) = 7900-8000 = -100
deviation 25 h. Var(Y- 8X) = Var(Y) + 8 2 Var(X) = 50 2 + 64 x 25 2 = 42 500 ( · · d d
Y- 8X ~ N(-100, 42 500) assummg m epen ence).
(a) Find, to three decimal places, the probability that an Econ light bulb will have a lifetime
between 975 hand 1020 h.
(b) Calculate, to three decimal places, the probability that the sum of the lifetimes of eight
Econ light bulbs will exceed 7930 h. Indicate clearly the stage in your calculation when an
P(Y- 8X> 0) = r(z;;. 1
O- (- 00))
-,/42 500
assumption concerning independence is essential. = P(Z > 0.485)
= 1 - <!>(0.485)
The lifetimes of Enersaver light bulbs are normally distributed with mean 7900 h and = 1-0.6862
Y-8X: -100 0
standard deviation 50 h. = 0.3138 Z: 0 0.485
(c) Calculate, to three decimal places, the probability that an Enersaver light bulb will last at
The probability that an Enersaver light bulb lasts at least eight times as long as an E
least eight times as long as an Econ light bulb. (NEAB) hght bulb 1s 0.314 (3 d.p.). con

Solution 8.11
X is the lifetime, in hours, of an Econ light bulb. Then X~ N(lOOO, 25
2
).

(a) P(975 <X< 1020) Miscellaneous exercise 8c


(975 ~ 1000 1020 -1000)
=P 25 < z < 25 1. T_he ;veights of grade A oranges are normally 3. In testin~ the length of life of electric light bulbs
d1stnbuted with mean 200 g and standard of a partrcular type, it is found that 12.3% of the
= P(-1 < Z < 0.8) deviation 12 g. Determine, correct to two bulbs tested fail within 800 hours and that
= <I>(l) + <!>(0.8) -1 significant figures, the probability that 28.1% are still operating 1100 hours after the
(a) a grade A orange weighs more than 190 g start of the test.
= 0.8413 + 0.7881-1 975 1000 1020
-1 0 0.8 but less than 210 g, ~ss~ming that the distribution of the length of
= 0.6294 {b) a sample of 4 grade A oranges weighs more hfe 1s normal, calculate, to the nearest hour in
than 820 g. each case, the mean, f.-l, and the standard
The probability that an Econ light bulb has a lifetime between 975 hand 1020 his 0.629 deviation, a, of the distribution.
(3 d.p.). T?e ;veights of grade B oranges are normally A light fitting takes a single bulb of this type. A
d1stnbuted with mean 175 g and standard packet of three bulbs is bought, to be used one
(b) S is the sum of the lifetimes of eight Econ light bulbs, so S = X 1 + X 2 + · .. + X 8 deviation 9 g. Determine correct to two after the other in this fitting. State the mean and
E(S) = 8E(X) = 8000 significant figures, the pr~bability that variance of the total life of the 3 bulbs in the
Var(S) = 8 Var(X) = 8 x 25 2 = 5000 (assuming the lifetimes are independent) (c) a grade B orange weighs less than a grade A packet in terms of P- and a and calculate to two
orange, decimal places, the probability that the ;otallife
:. S ~ N(8000, 5000) {d) a sample of 8 grade B oranges weighs more is more than 3300' hours.
Calculate the probability that all 3 bulbs have
P(S > 7930) = r(z >
79
~
5000
)
00 than a sample of seven grade A oranges. (C)

2. Prints from tvvo types of film C and D have


lives i~ e~cess of 1100 hours, so that again the
to~al hfe 1s more than 3300 hours. Explain why
developing .times whic~ can be modelled by th1s answer should be different from the previous
= P(Z > -0.990) normal vanables, C wtth mean 16.18 sand one. (NEAB)
= <!>(0.990) 7930 8000 standard deviation 0.11 sand D with mean
= 0.8389 -0.990 0 15.88 sand standard deviation 0.10 s. 4. The weight of a large loaf of bread is a normal
variable with mean 420 g and standard deviation
The probability that the sum of the lifetimes of eight Econ light bulbs exceeds 7930 h is {a) What is the probability that a type C print 30 g. The weight of a small loaf of bread is a
will take less than 16 s to develop? normal variable with mean 220 g and standard
0.839 (3 d.p.). (b) A type C print is developed and immediately deviation 10 g.
afterw.ards a type D print is developed.
What IS the probability that the total time is (a) Find the probability that 5 large loaves
greater than 32.5 s? weigh more than 10 small loaves.
(c) W~at is the probability of a type C print (b) Find the probability that the total weight of
tal~mg longer to develop than a typeD 5 large loaves and 10 small loaves lies
pnnt? between 4.25 kg and 4.4 kg. (C)
418 ,0\ CONCISE COUP.SE iN /'1-LE 1l:.L ST.t. TiSTiCS

5. The tensile strengths, measured in newtons (N), 9. Jam is packed into tins of advertise~ wei~ht 1_ kg. 12. [In this question give three places of decimals in 13. A small bank has two cashiers dealing with
of a large number of ropes of equal length are The weight of a randomly selected tm ?f Jam .ts each answer.] customers wanting to withdraw or deposit cash.
independently and normally distributed such that normally distributed about a target wetght wtth a The mass of tea in 'Supacuppa' tea bags has a For each cashier, the time taken to deal with a
5% are under 706 Nand 5% over 1294 N. standard deviation of 12 g. normal distribution with mean 4.1 g and customer is a random variable having a normal
Four such ropes are randomly selected and (a) If the targetweight is 1 kg, find the . standard deviation 0.12 g. The mass of tea in distribution with mean 1.50 sand standard
joined end-to-end to form a sin.gle rope; the probability that a randomly chosen tm 'Bumpacuppa' tea bags has a normal distribution deviation 4 5 s.
strength of the combined rope IS equal to the weighs with mean 5.2 g and standard deviation 0.15 g.
(a) Find the probability that the time taken for
strength of the weakest of the ~ selecte? ropes. (i) less than 985 g, (a) Find the probability that a randomly chosen a randomly chosen customer to be dealt
Derive the probabilities that thts combmed rope (ii) between 970 g and 1015 g. . Supacuppa teabag contains more than 4.0 g with by a cashier is more than 180 s.
will not break under tensions of 1000 Nand (b) If not more than one tin in 100 is to weigh of tea. (b) One of the cashiers deals with rw-o
900 N, respectively. less than the advertised weight, find the (b) Find the probability that, of 2 randomly customers, one straight after the other.
A further 4 ropes are randomly selected and minimum target weight required to meet this chosen Supacuppa teabags, one contains Assuming that the times for the customers
attached between two rings, the strength of the condition. more than 4.0 g of tea and one contains less are independent of each other, find the
arrangement being the sum of the stren~t_h:' of (c) The target weight is fixed at 1 kg. Th~ than 4.0 g of tea. probability that the total time taken by the
the 4 separate ropes. Derive the proba~1ht1es that resulting tins are packed in boxes o~ st::C and (c) Find the probability that 5 randomly chosen cashier is less than 200 s.
this arrangement will break under tensions of the weight of the box is normally dtstnbuted Supacuppa teabags contain a total of more (c) At a certain time, one cashier has a queue of
4000 Nand 4200 N, respectively. (NEAB) with mean weight 250 g and standard than 20.8 g of tea. 4 customers and the other cashier has a
deviation 10 g. Find the probability that a (d) Find the probability that the total mass of queue of 3 customers, and the cashiers begin
6. X and Yare independent normally distributed randomly chosen box of 6 tins will weigh tea in 5 randomly chosen Supacuppa to deal with the customers at the front of
random variables such that X has mean 32 and less than 6.2 kg. (L) tea bags is more than the total mass of tea in their queues. Assuming that the cashiers
variance 25, andY has mean 43 and variance 96. 4 randomly chosen Bumpacuppa tea bags. work independently, find the probability
Find 10 (a) The lifetime in hours of an electrical (C)
component has a normal distribution with that the 4 customers in the first queue will
(a) P(X > 43), mean 150 hours and standard deviation all be dealt with before the 3 customers in
(b) P(X- Y>O), the second queue are all dealt with. (C)
(NEAB) 8 hours.
(c) P(2X- Y > 0). Find the probability that
(i) a new component lasts at least 160
7. The times taken by two runners A and B to run
hours,
400 m races are independent and normally
distributed with means 45.0 sand 45.2 s, and
(ii) a component which has already Mixed test SA
operated for 145 hours will last at
standard deviations 0.5 sand 0.8 s respectively.
least another 15 hours. 1. A country baker makes biscuits whose masses (a) the mass of a randomly chosen cake is
The two runners are to complete in a 400 m race
(b) The weight of these components is normally arc normally distributed with mean 30 g and between 24.7 g and 25.7 g,
for which there is a track record of 44.5 s. distributed with mean 250 g and standard standard deviation 2.3 g. She packs them by (b) the total mass of a randomly chosen packet
(a) Calculate, to three decimal pl~ccs, the deviation 10 g. Each component is in its hand into either a small carton (containing 20 is less than J 73 g.
probability of runner A breakmg the track own box, the weight of which is also biscuits) or a large carton (containing 30
record. normally distributed with mean 50 g and State one assumption that you have made in
biscuits).
(b) Show that the probability of runner B standard deviation 5 g. There are 10 boxed answering (b). (NEAB)
breaking the track record is greater than components to a carton and the wei?ht of (a) State the distribution of the total mass, S, of
the carton is normally distributed wtth mean biscuits in a small carton and find the 3. Manto sherry is sold in bottles of rw-o sizes:
that of runner A.
(c) Calculate, to three decimal places, the probability that Sis greater than 615 g. standard and large. For each size, the content, in
75 g and standard deviation 7 g.
probability of runner A beating runner B. Find the probability that a carton of 10 (b) Six small and four large cartons are placed litres, of a randomlY chosen bottle is normally
(NEAB) boxed components weighs less than 3 kg. (L) in a box. Find the probability that the total distributed with mean and standard deviation as
mass of biscuits in the 10 cartons lies given in the table.
8. In a packaging factory, the empty _containers for 11. Jim Longlegs is an athlete whose specialist event between 7150 g and 7250 g.
a certain product have a mean wetght of 400 g is the triple jump. This is made up of a hop, a (c) Find the probability that 3 small cartons Standard
with a standard deviation of 1 0 g. The mean step and a jump. Over a season the lengths of the contain at least 25 g more than 2 large ones.
Mean deviation
weight of the contents of a full cont,~iner is 800 g hop step and jump sections, denoted by f!,
S The label on a large carton of biscuits reads 'Net
with a standard deviation of 15 g. hnd the and'; respectively, are measured, from wluch the mass· 900 g'. A trading standards officer insists Standard bottle 0.760 0.008
expected total weight of 10 ~ull c?ntainers a-?-d following models are proposed: that 90°/.J of such cartons should contain biscuits Large bottle 1.010 0.009
the standard deviation of this wetght, assummg H- N(5.5, 0.5 2 ), S- N(5.1, 0.6 2 ) , / - N(6.2, 0.8') with a total mass of at least 900 g.
that the weights of containers and contents are (a) Show that the probability that a randomly
where all distances are in metres. Assume that H, (d) Assuming the standard deviation remains
independent. . unchanged, find the least value of the mean chosen standard bottle contains less than
Assuming further that these wetghts are norm~lly S and .Tare independent. 0.750 litres is 0.1056, correct to four places
· o f h.IS tnp· 1e JU
· mps will mass of a biscuit consistent with this
distributed random variables, find the proportiOn (a) In what proportton of decimals.
requirement. (MEl)
of batches of 10 full containers which weigh Jim's total distance exceed 18m? I (b) Find the probability that a box of 10
more than 12.1 kg. (O&C) (b) In 6 successive independent attempts, wut 2. Foster's Fancy Cakes are sold in packets of six. randomly chosen standard bottles contains
is the probability that at least one total The mass of each cake is a normally distributed at least 3 bottles whose contents are each
distance will exceed 18m? 0
f random variable having mean 25 g and standard less than 0.750 litrcs. Give three significant
(c) What total distance will Jim exceed 95 Yo
0
deviation 0.4 g. The mass of the packaging is a figures in your answer.
the time? . . , t triple normally distributed random variable having {c) Find the probability that there is more
(d) Find the probability that, ll1 }1m s n~x h mean 20 g and standard deviation 1 g. Find, to sherry in 4 randomly chosen standard
. h. ·11 b t than h1s (MET)
op.
1ump, ts step w1 e grea er three decimal places, the probabilities that bottles than in 3 randomly chosen large
bottles. (C)
Mixed test 8B
1. The continuous random variables X and Y The mass of a ginger biscuit has a normal
represent the masses of male and female students distribution with mean 10 g and standard
who attend my local College. deviation 0.3 g. Determine the probability that a
Both X and Yare normally distributed such that collection of 7 cheese biscuits has a mass greater
X- N(75, 6 2 ) andY- N(65, 5 2 ), where all than a collection of 4 ginger biscuits.
masses are given in kilograms. (It may be assumed that all the biscuits were
sampled at random from their respective
(a) Find the probability that, if a male student populations.) (C)
and a female student are chosen at random,
they both have a mass exceeding 70 kg. 3. Certain components for a revolutionary new
(b) State carefully the distribution of the sewing machine are assembled by inserting a part
combined mass of a random sample of
m male and f female students.
of one type (sprotsil) into a part of another type Sampling and estimation
(weavil). Sprotsils have external dimensions
A lift in the college has a notice which are normally distributed with mean
2.50 em and standard deviation 0.018 em.
I MAXIMUM 8 PEOPLE or 650 kg Weavils have internal dimensions which are In this chapter you will learn about
normally distributed with mean 2.54 em and
Find the probability that the combined mass standard deviation 0.024 em. Under suitable
of a random sample of 8 students will pressure, the two types fit together satisfactorily sampling methods including random and non-random sampling
exceed the mass restriction if it consists of if the dimensions differ by not more than
(i) 8 males, ±0.035 em. Show that, if pairs of parts are how to simulate a random sample from a given distribution
(ii) 5 males and 3 females. chosen at random, the difference
(c) What is the probability that a randomly "' the expectation and variance of the sample mean
selected female student has a greater mass D = internal dimension of a weavil
than a randomly selected male student? -external dimension of a sprotsil the distribution of the sample mean
(MEl) is distributed with mean 0.04 em and standard the use of the central limit theorem
deviation 0.030 em. Hence show that
2. The mass of a cheese biscuit has a normal approximately 42.8% of randomly selected pairs the distribution of the sample proportion
distribution with mean 6 g and standard will fit together satisfactorily. Now, if it is
deviation 0.2 g. Determine the probability that known that the internal dimension of a given estimates of population parameters:
(a) a collection of twenty-five cheese biscuits weavil is 2.517 em, what is the probability that a mean
has a mass of more than 149 g, randomly chosen sprotsil will fit this weavil variance
(b) a collection of 30 cheese biscuits has a mass satisfactorily? (AEB)
proportion
of less than 180 g,
(c) twenty-five times the mass of a cheese G confidence intervals for:
biscuit is less than 149 g.
a population mean, involving the z-distribution
a population mean, involving the !-distribution
- a population proportion

SAMPLING

Population
In a statistical enquiry you often need information about a particular group. This group is
known as the population or the target population, and it could be small, large or even infinite.
Note that the word 'population' does not necessarily mean 'people'.
Here are some examples of populations:
pupils in a class,
people in England in full time employment,
hospitals in Wales,
cans of soft drink produced in a factory,
ferns in a wood,
rational numbers between 0 and 10.
The sampling units must be defined clearly. These are the people or items to be sampled, for
SURVEYS example
the primary school,
Information is collected by means of a survey. There are two types:
the oak tree,
(a) a census, the person suffering from a heart attack.
(b) a sample survey. Once the sampling units within a population are individually named or numbered to form a
list, then this list of sampling units is called a sampling frame. It could take various forms
(e.g. a list, a map, a set of maps), and should be as accurate as possible.
{a) Census Ideally the sarr,'pling frame should be the same as the target population. For example, if the
target populatiOn iS all the first year students in a college, then the sampling frame and the
In a census every member of the population is surveyed.
target population should be the same, provided that the register is up-to-date and accurate. A
When the population is small, this could be a straightforward exercise. For example, it would sampling frame for people in Britain eligible to vote, however, is more difficult to form. The
be easy to find out how each pupil in a class travelled to s~hool on a particular _mormng. electoral register attempts to list all those who are eligible to vote throughout all the areas in
When populations are large, taking a census can be very time consummg and difficult to do the country, but it is never completely accurate, since many changes occur during the time that
with accuracy. Each year the government carries out a census in schools on the th1rd Thursday the information is being processed. Some people do not return the forms, people move in and
in January. This requests the number of boys and girls in each age group on the roll of every out of the area, people die etc.
school in the country. Its accuracy, though, relates only to that day. Even more difficult to
carry out accurately is the population census taken every ten years. This attempts to provide In some instances it is not possible to enumerate all the population, for example, the fish in a
details of different age groups for every area in Britain. When populations are very large, or lake.
infinite, it is not possible to survey every member.
On occasions it would not be sensible to survey every member. For example, if you performed Example 9.1
a census to establish the length of life of a particular brand of light bulb, you would test each (a) Explain briefly what you understand by
bulb until it failed and so you would destroy the population! (i) a population,
(ii) a sampling frame.
(b) A market research organisation wants to take a sample of
(b) Sample survey (i) owners of diesel motor cars in the UK,
(ii) persons living in Oxford who suffered from injuries to the back during July 1996.
When a survey covers less than 100°/o of the population, it is known as a sample survey. In
many circumstances, taking a sample is preferable to carrying out a census. Sample data can Suggest a suitable sampling frame in each case. (L)
be obtained relatively cheaply and quickly and, if the sample is representative of the . .
population, a sample survey can give an accurate indication of the population charactensttc
Solution 9.1
being studied.
(a) (i) A population is a particular group of individuals or items.
The size of the sample does not depend on the size of the population. It often depends on the
(ii) Once the individual members of a population have been numbered to form a list this
time and money available to collect information. Note that large samples are more hkely to
list is called a sampling frame. '
give more reliable information than small ones. The next time that you read the results of a
public opinion poll in the newspaper, look at the size of the sample - it is usually over 1000. (b) (i) The list of registered owners as kept by DVLA in Swansea.
(ii) A list made from information supplied by Health Clinics in Oxford during July 1996.

Sample design
Once the purpose of a survey has been stated precisely, the target population must be defined, Bias
for example
The purpose of sampling is to gain information about the whole population by selecting a
all the primary schools in England, sample from that population. You want the sample to be representative of the population so
all the oak trees in Hampshire, . . . k. you must give every member of the population an equal chance of being included in the
all the people admitted to the General Hospital m January suffenng from a heart attac
sample. This should eliminate any bias in the selection of the sample.
.................-----------------------,1------------------------------------------··
ll-_I I_"

Sources of bias include Drawing lots


(a) the lack of a good sampling frame:
For each
· member, d place a coloured
· ball into a container and then d raw n b a 11 s out o f tbe
- using the telephone directory misses all those who do not have a telephone or whose
contamer at ran om and Without replacement. If you wanted a sampl f · 20 ld
number is ex-directory, d 20 b 11 Thi · · bl e f or a small population Note h e o s1ze , you wou
using the electoral register in a city area misses the more mobile section of the population. raw bout1 a s. s 1s smta. . . · h tb
, owever, t at e samp e 1
must e arge enough to prov1de suffiCiently accurate information about the popu1at10n. ·
(b) the wrong choice of sampling unit:
- choosing an individual rather than a particular group such as 'household'. The sample should be selected at random. Any hint of possible bias should be avoided.
(c) non-response by some of the chosen units:
If the population is large then the method of drawing lots sometimes described as 'd ·
- it might be difficult to locate the particular unit, t f h ' · · 1 ' rawmg
ou o a at IS not practlca . You could instead make the choice by referring to randon 1
the cooperation of the respondent might not have been obtained,
number tables. For your reference, a set is printed on page 653.
- the enquiry might not have been understood, for example, a questionnaire might ha~e
been badly designed. Questionnaires should be clear, specific, unambiguous and easily
understood. Questions should be worded neutrally, especially in opinion surveys, to
avoid bias caused by pointing towards a particular response. Using random number tables
(d) bias introduced by the person conducting the survey:
- the interviewer might not question someone who appears uncooperative, 1 h number tables. consist of lists of digits 0 ' 1' 2 ' 3 ' · ·., 9 , such that each d'tgi·t h as an
Random
- the style of questioning may influence the response. equa c ance of occurrmg, so for example, the probability that a 3 occurs is 0 1 ] 11 d
b bl h d' · · . ran om
num er ta est e 1g1ts may appear singly or be grouped in some way. This is solely for
It should be noted that a sample can only be representative of the population from which it is convemence of pnntmg.
selected. If you select a sample of teachers from one school, the sample is representative of the
teachers in that school, not of all teachers in all schools.
Example 9.2
Here is an extract from a set of random number tables
SAMPLING METHODS 6 8 7 2 5 3 8 1 5 9
2 5 3 4 7 0 5 4 9 5
Once a sampling frame has been established, you can choose a method of sampling. These fall 3 2 6 8 7 4 4 7 0 5
into two categories:
Use it to select a random sample of
e random sampling e.g. simple, systematic, stratified;
o non-random sampling e.g. quota, cluster (a) eight people from a group of 100 people
(b) eight people from a group of 60. '

Simple random sampling Solution 9.2

Suppose a population consists of N sampling units and you require a sample of n of these units. (a) To select a group of eight people from a target population of 100 people, allocate a two-
A sample of size n is called a simple random sample if all possible samples of size n are equally digit number to each person, for example allocate 01 to the first on the list 02 to the
likely to be selected. Some form of random processes must be used to make the selection. second, ... up to 98, 99, 00, calling the hundredth person 00 for convenie;ce.

If the unit selected at each draw is replaced into the population before the next draw, then it Using the list, starting at the begi1ming of the first row and reading along the rows you
can appear more than once in the sample. This is known as sampling with replacement. would select people corresponding to the following numbers: '

If the unit selected at each draw is not replaced into the population before the next draw, this 68 72 53 81 59 25 34 70
is known as sampling without replacement. Alternatively, you could decide to read the digits backwards, from bottom right, in which
The second method of sampling without replacement is known as simple random sampling. case your sample would consist of people corresponding to the numbers

Two methods of simple random sampling are commonly used 50 74 47 86 23 59 45 07

e drawing lots, (b) To select a group of eight from a target population of 60 people, allocate each person a
number from 01 to 60.
e random number sampling.
For each, make a list of all N members of the population and give each member a different Using the tables, disregard any two-digit number outside the range.
nmnber.
Starting at the beginning of the first row and grouping in pairs gives Suppose the numbers you get are
_,61( ;n; 53 ;;1 59 25 34 /0 54 35 32 _,61( J4 47 05 0.730, 0.798, 0.369, 0.499, 0.491, 0.310, 0.135, 0.112, 0.593, 0.652, 0.015, 0.346
So you would choose people corresponding to the numbers You can interpret them in various ways, for example:
53, 59, 25, 34, 54, 32, 47, 05. " If you decide to use the first two digits to the right of the decimal point each time, you
would obtain the numbers .73, JY, 36, 49, 49, 31, 13, 11, 551, .65, 01, 34.
Ignoring repeats and numbers bigger than 49, the six numbers would be
Example 9.3 36, 49, 31, 13, 11, 1.
Use the following extract from random number tables to select a ~a~domls;mple of 12 " Suppose instead you decide to choose the second and third digits to the right of the decimal
numbers, each to two decimal places, from the contmuous range ""'x < . point and ignore repeats and numbers bigger than 49. In this case your numbers would be
30, 10, 35, 12, 15, 46.
54 80 68 72 51 96 08 00
~ ~ w ~ 60 43 57 ~ 13 «
e If you decide to use all the digits after the decimal point, you would be choosing from the
digits 730798369499491310135112593652015346. Grouping these as two-digit numbers
gives 73; 07,%; 36, 94; 99; 49, 13, 10, 13; >1, 12, 59, 36, 52, 01, 53, 46.
Solution 9.3 Ignoring repeats and numbers bigger than 49 gives the six numbers as
7, 36, 49, 13, 10, 12.
Since the sample values are required to two dec~mal place accura~~' consider groups of three
. . mser
d 1g1ts, . t.m g the decimal point between the brst and second digit. The lists are endless!

In this case your sample would consist of the values


5.27, 4.54, 8.06, 8.72, 5.19, 6.08, 0.00, 2.52, 0.99, 3.60, 4.35, 7.42 Systematic sampling
Random sampling from a very large population is very cumbersome.
An alternative procedure is to list the population in some order, for example alphabetically or
Example 9.4 in order of completion on a production line, and then choose every kth member from the list
Here is a set of random numbers after obtaining a random starting point. If you choose every tenth member from the list, for
example every tenth vehicle passing a checkpoint, you would form a 10% sample. If you
848051 386103 153842 242330 580007 479971 choose every twentieth item, for example every twentieth card in an index file, you would
Use it to select a random sample of four numbers, each to three decimal places, from the form a 5% sample.
continuous range 0 ~ x < 5.
Example 9.5

Solution 9.4 d d. · Describe how to choose a systematic sample of eight members from a list of 300.
Consider groups of four digits, inserting the decimal point between the first and secon igit.
Solution 9.5
Disregard any values that are out of range. This gives
_8.Aij() .s.;t31f ..6d(J3' 1.538 4.224 2.330 ~ 0.747 Since you are going to choose every kth member, you need to find a suitable value for h. To
N
So the numbers chosen are 1.538, 4.224, 2.330, 0.747. do this, choose a convenient value close to-.
n
N 300
In this case,-~--~ 37.5, so k ~ 40 will do.
n 8
Calculator random number generator Now choose a random starting point, for example if IRan#/ on your calculator gives 0.870
1 ~ on your calculator, which take the first member of the sample as 87 and then add 40 each time . The other members are
You probably have a random number generator <ey an ss it The numbers generated are in 127, 167, 207, 247, 287, 27 and 67. Note that when you reach the end of the list, go back to
produces a number, for exampl? 01.3f98, evlery t;;"e you hryepse~do random numbers, but they the beginning.
fact obtained using a mathematlca ormu a an are rea
suit the purpose very well indeed. 1 So the sample consists of 27, 67, 87, 127, 167 207, 247, 287.
Suppose you want to use your c~lculator to select a random samp1eo f SIX . num b e rs between
and 49 for your entry in the Natwnal Lottery. The advantages of systematic sampling are that it is quick to carry out and it is easy to check
for errors. For large scale sampling, systematic selection is usually used in preference to taking
To do this, you probably need to press [Shift[ then [Ran# l ~- simple random samples.
T
I
The disadvantage of this system is that there may be a periodic cycle ~ithin th~ fr~me itself. Non-random sampling
For example a machine may o~era~e in such~ manner th~~de;::~~~~~th~~;~;sit:~:~~ the
Sy stematic sampling of every fifth item, starnng at 5, wo
· ld d
.h f 1 .
mple wit no au ty items.
Of I (a) Cluster sampling

~~~;~ ~t:~~ ~:~:;di;:;~l:a~s s::~~;;isa:d t:O~udiffe:~n~~:~;fes


2
could be taken by varying the Sometimes there is a natural sub-grouping of the population and these subgroups are called
clusters. For example, in a population consisting of all children in the country attending state
startin~ points and the length of the interval between the chosen items.
primary schools, the local education authorities form natural clusters. When a sample survey
is carried out on a population that can be broken into clusters it is often more convenient to
first choose a randon1 sample of clusters and then to sample within each cluster chosen.
Stratified sampling Unlike stratified sampling where the strata are as different from each other as possible, each
S ·r d m lin is used when the population is split into distinguishable layers orstrata cluster should be as similar to other clusters as possible.
t:a~t~:: q~~te ~iff;rent from each other and which together cover the whole populatiOn, for One advantage of cluster sampling is that there is no need to have a complete sampling frame
example of the whole population. For the primary school children, you would need only a list of the
pupils in the chosen local authority. Another advantage is tbat it is usually far less costly than
age groups,
random sampling. Consider the fees and travelling expenses paid to interviewers. Far less
occupational groups,
travelling and time is involved in an interviewer visiting individuals in a cluster than visiting
- topographical regions.
individuals in the whole population.
Separate random samples are then taken from each stratum and put together to form the
The disadvantage of cluster sampling is that it is non-random. Suppose that a town has 7500
sample from the population. .
primary school children in 250 classes, each with an average class size of 30. If you want to
It is usual to represent the population proportionately in the strata, as in the followmg select a sample of 90 children then you could use simple random sampling. It would however
example. be quicker to use the classes as clusters and to take a sample consisting of three classes. This
would give a sample of 90 children. The problem is that within each class there will be a
certain amount of similarity between the children in say age, ability, home background. In
Example 9.6 .
selecting one whole class or cluster you are in fact selecting 30 similar children instead of 30
·
Competent C arners emp1oys 320 drivers ' 80 administrative staff and. 40 mechamcs.
· h A
11 randomly chosen children from throughout the town. Therefore three clusters will not give as
committee tore resent all the employees is to be formed. The committee IS to_ ave .
members and d;;
selection is to be made so that there i~ as close .a representation as posstble
without bias towards any individuals or groups. Explam how this could be done.
precise a picture of the whole population as 90 children chosen at random from throughout
the town.

(b) Quota sampling


Quota sampling is widely used in market research where the population is divided into groups
Solution 9.6
in terms of age, sex, income level and so on. Then the interviewer is told how many people to
f ou were to take a simple random sample of all 440 employees this would mean that every
~,;' lo
interview within each specified group, but is given no specific instructions about how to locate
ee would have an equal chance of being selected. There is a high probability that the them and fulfil the quota. This is the method generally used in street interview surveys
committee wou ld conSi.st of 11 drivers and therefore would not be representative of all
p y commonly carried out in shopping centres. It is quick to use, complications are kept to a
employees. . minimum and, unlike random sampling, any member of the sample may be replaced by
A stratified random sample would provide a more accurate representation of the populatwn another member with the same characteristics.
and could be formed as follows: If no sampling frame exists, then quota sampling may be the only practical method of
Taking into account that drivers make up f"!o of the work force, obtaining a sample. The disadvantage of quota sampling, however, is that it is non-random.
There is a possibility of bias in the selection process if, for example, the interviewer selects
number of drivers== ~~g x 11 == 8 those easiest to question or those who look cooperative. The location of such surveys in
shopping centres excludes a substantial part of the population in that area. It is difficult to
Similarly
find out about those who refuse to cooperate and they are simply replaced. One of the reasons
number of administrative staff= :2o x 11 = 2 put forward to explain the inaccuracy of the opinion polls before the British general election
number of mechanics= 44Jo x 11 = 1 in 1992 was the high refusal rates of Conservative voters to take part in surveys.
. . h d . t f m the administrative
The required representation on the committee IS eig t nvers, w~ rod f h stratum by
staff and one mechanic. The people to be included ~an then be se ecte rom eac
using simple random sampling or systematic samplmg.
8. A research study into the use of hormone (a) Suggest one advantage and one disadvantage
Sarnpling methods replacement therapy for women in the United of this sampling method.
Kingdom involved in a survey of women in three (b) Of the 5025 women contacted, 3238
1. Explain briefly the difference between a census (b) A meat canning factory supplies a general medical practices in Greater London. The returned a completed questionnaire, and
supermarket with cans of meat in three designer of the survey describes his method of 330 of these had received hormone
and a sample survey. .
Give an example to illustrate the practical use of sizes: large, medium and small. obtaining his sample as follows. replacement therapy. Given that there are
The regular consignment is of 300 large 'I obtained the names and addresses of 5025 about 703 000 women in the 45-65 age
each method.
A school held an evening disco which was cans 500 medium cans and 400 small cans. women aged between 45 and 65 from the group living in Greater London, obtain an
attended by 500 pupils. The disco organis~rs Des~ribe how the supermarket could apply practices' age-sex registers. The women were estimate for the number of 45-65 year-old
were keen to assess the success of the evemng. the method of stratified random sampling to sent a questionnaire that asked whether they had women in Greater London who have
Having decided to obtain informatio.n from thos~ a sample of 60 cans to test the quality of received hormone replacement therapy' received hormone replacement therapy.
attending the disco, they were undeClded whether these goods. Source: British Medical] ournal With reference to the sampling method used,
to usc a census or a sample survey. December 1989 comment on the reliability of this estimate.
Which method would you recommend them to 5. Write brief notes on (c) Suggest an alternative method of obtaining
use? (a) simple random sampling, such an estimate. (NEAB)
Give one advantage and one disadvantage (b) quota sampling.
associated with your recommendation. (L)
Your notes should include a description of each
method, and an advantage and a disadvantage
2. A school of 1000 pupils is divided into year SIMULATING RANDOM SAMPLES FROM GIVEN DISTRIBUTIONS
associated with it. (L)
groups as follows
6. In a school year group of 140 pupils there are. 60
Year Number of pupils girls and 80 boys. A survey is to be taken t? fmd
A good way to simulate a random sample from a given distribution is to use cumulative
methods to improve the school's meal services. A proportional frequencies or cumulative probabilities, as illustrated in the following examples.
7 150 sample of 14 members of this group is needed for
8 150 the survey. .
9 150 The school decides to use one of the followmg
10 150 methods to obtain the names of pupils for the (a) From a frequency distribution
sample:
11 !50
A: Every tenth name on the year group register is Example 9.7
12 125
selected for the sample. .
l3 125 B: Each of the 140 names is allocated a dtfferent Use the sequence of random digits 364294 588330 923918 400300 to generate five simulated
number from 1 to 140 inclusive; the school's observations from the following frequency distribution.
A survey is to be carried out and a committee computer then picks 14 different random
representative of the school is to be formed numbers between 1 and 140 inclusive
consisting of 40 pupils.
It is decided that stratified sampling should be
(a} State briefly one advantage and one X l 2 3 4
disadvantage of each method. . .
used.
(b) Explain what is meant by a strat1fted f 8 12 14 6 Total40
(a) Calculate the number of pupils chosen from random sample and describe ho~ method B
each year group. . could be changed to give a strat1fted random
(b) Explain how to choose the puptls from sample.
Year 7. Solution 9.7
7. Explain briefly the difference between a census
3. (a) Explain briefly and sample, and give two reasons why a sample Consider first the cumulative frequencies and then transfer them-to cumulative proportional
(i) why it is often desirable to take may be preferred to a census. . frequencies with a total proportion of 1. Then allocate the random numbers in a convenient
samples, Explain the meaning and purpose of a samplmg way in accordance with the cmnulative proportional frequencies.
(ii) what you understand by a sampling frame in random sampling. . f
frame. It is required to obtain the views .of the yup1l? 0 d
(b) State two circumstances when you would a school about the school magazme. It IS de.Cide
consider using to do this by means of a small panel of pupils. Cumulative Corresponding
(i) clustering, Describe briefly how you would select such a Cumulative proportional random
(ii) stratification, panel using
X f frequency frequency numbers
when sampling from a population. (a) simple random sampling,
(c) Give two advantages and two disadvantages
(b) stratified random sampling. 1 8 8 fo ~ 0.20 01 to 20
associated with quota sampling (L) State with reason which of these two sampliJ~g 2 12 20 ~=0.50 21 to 50
< '
methods you cons1~d er to be t he mol·c appropnate
(AEB)
4. (a) A television company wishes to estimate the for this situation. 3 14 34 ~=0.85 51 to 85
popularity of a particul~r television series by 4 6 40 ~= 1 86 to 99 and 00
street interviews. Descnbe how the method
of quota sampling might be used for this
investigation.
Since the cumulative proportional frequencies contain two decimal places, i; is c~~v;nient to Solution 9.9
use two-digit random numbers. Note that 00 has been allocated to the x-va ue o or
convenience. Calculate the cumulative probabilities, either by calculating probabilities first or using
cumulative probability tables directly (see page 682).
Take 5 two-digit random numbers from the list: 36, 42, 94, 58, 83
Remember iliat P(X ~ x) ~ 4 Cx0.8 4 -x0.2" for x ~ 0,1,2,3,4.
Match these up with the corresponding sample values: 2, 2, 4, 3, 3

So a random sample of size 5 from the given distribution is 2, 2, 3, 3, 4. Corresponding random


X P(X~x) F(x) numbers
0 o.8 4 ~ 0.4096 0.4096 0001 to 4096
1 4 X 0.8 3 X 0.2 ~ 0.4096 0.8192 4097 to 8192
(b) From a probability distribution 2 6 X 0.8 2 X 0.2 2 ~ 0.1536 0.9728 8193 to 9728
3 4x0.8x0.2 3 ~0.0256 0.9984 9729 to 9984
4 0.2 4 ~ 0.0016 1 9985 to 9999 and 0000
Example 9.8
G enerate a ran d om Sarrlple Sl.ze 10 from the given probability distribution, using the random The random number 2811 is in ilie range 0001 to 4096 and so corresponds to x ~ 0.
numbers 3 7 4 7 6 5 3 3 9 0.
Similarly 5747 corresponds to x ~ 1
6157 corresponds to x ~ 1
0 1 2 3 8988 corresponds to x ~ 2

0.1 0.2 0.4 0.3 So the random sample of four observations from the binomial distribution consists of the
values 0, 11, 2.

Solution 9.8
Example 9.10
Form the cumulative distribution function F(x) and then allocate random numbers in a
convenient way. Using the random number 8135 take a single randon1 observation from a Poisson distribution
with parameter 3.

Corresponding
Solution 9.10
X P(X~x) F(x) random numbers
X- Po(3).
0 0.1 0.1 1
1 0.2 0.3 2,3 Using cumulative Poisson probability tables (see page 648) and arranging the results in a table
0.4 0.7 4, 5, 6, 7 together with a convenient corresponding random nmnber allocation gives:
2
3 0.3 1 8, 9, 0
Corresponding random
X F(x) numbers
Take the 10 random numbers given and convert them to sample values: 0 0.0498 0001 to 0498
Random number 3 7 4 7 6 5 3 3 9 0 1 0.1991 0499 to 1991
Sample values 1 2 2 2 2 2 1 1 3 3 2 0.4232 1992 to 4232
3 0.6472 4233 to 6472
So the sample values are 1, 1, 1, 2, 2, 2, 2, 2, 3, 3.
4 0.8153 6473 to 8153
5 0.9161 8154 to 9161
6 0.9665 9162 to 9665
Example 9.9 7 0.9881 9666 to 9881
1
Generate a random sample of size 4 from the binomial distribution X- B(4, 0.2), using oe 8 or over 1 9882 to 9999 and 0000
random numbers 2811 5747 6157 8988.
The given random number 8135 is in the range 6473 to 8153, so the random observation
corresponds to x = 4.
T
Now take the second three digits
Example 9.11
<I>~ 0.824
Using the random digits 723 850 take a random sample of size 2 from the continuous
z ~ <1>- 1(0.824)
distribution with probability density function
~ 0.931
3 2 x-30
f (x) ~-x forO <x< 2 -2-~0.931
8
x~30+1.862
~31.9 (1 d.p.) 0 0.931
Solution 9.11 30 31.862

The cumulative distribution function is given by So the two random observations are 29.4 and 31.9.
X 3
F(x) ~
I
0
-x 2 dx
8
x' ise 9b Simulating random sarnples from given distrib .+: •. --
8
In the foll?wing, use the random number tables on 6.
Taking the first three random numbers: page 653.1f random numbers have not been given. T_ak~ a r?ndom sample of size 6 from the
dtstnbutwn:
if F(x) ~ 0. 723, then
~~-=· ill

1. Select a rando.m sample of size 10 (to 3 d.p.)


X 15 16 17 18 19
x' from the contmuous range 3 .-; ; x < 9.
8~0.723 f 13 15 12 6 4
2. Dra~ up a r_andom sample of 100 numbers from
and x ~~8 x 0.723 ~ 1.80 (2 d.p.) the dtsc~ete mteger range 0 to 9. Find the mean 7. ~ak~ a r~ndom sample of size 3 from the
and var~ance of the sample values and compare chstnbutwn:
Taking the next three random numbers: them Wlth the theoretical mean and variance.
X 2.3 2.4 2.5 2.6 2.7
if F(x) ~ 0.850, then 3. T.he ?isc:ete random variable X has probabilit
d1stnbutwn Y f 40 60 90 50 60
x'
8~0.850 X 5 6 7 8 9
8. Take a rat~dom sample of size 10 from each of
P(X-x) the fol_lowmg probability distributions. In each
and x ~ ~8 x 0.850 ~ 1.89 (2 d.p.) 0.15 0.2 0.33 0.21 0.11
case, fmd the sample mean and variance and
So the two random observations are x ~ 1.80 and x ~ 1.89. S~mulatc a sample of size 12 from the compare with E(X) and Var(X).
dts~ribution of X. Compare the mean and (a)
vanancc of this sample with E(X) and Var(X). X 1 2 3 4

Example 9.12 4. The ~iscrete random variable X has distribution P(X ~x) 0.11 0.2 0.45 0.24
functiOn
·d F(x)=~}(x-2) , x=J ,4, ,56
.sm U. g
Use the random numbers 382 824 to take a random sample of 2 from the normal distribution I~n om nu~ber tables, generate 10 observations (b) P(X~x)~kx,x~0,1,2,3.
0 X, ~howmg your working dearly.
N(30,4).
Descnbe ~ow you would select a random sample 9. ~ak~ a r~ndom sample of sizeS from the
of 30 pupils from a school containing 850 pupils. chstnbutwn of X where F(x) = lx
5 ,
x = 2 , 3' 4 , '1.
Solution 9.12
5. You wish to select a person at random from a 10. (a) ~he ~iscrete random variable X is such that
X- N(30, 4). group of 58 people. The following procedure is . - (3, 0.4). Take a random sample of
Cumulative probabilities <l>(z) are given in the standard normal tables (see page 649). suggested: stze 5 from this distribution, using the
Taking the first three digits of the random number list Allocate th.c n~mbers 1 to 58 to the people. random numbers
C1rose ~ lme m a table of random numbers and 407 315 401 203 972
<l>(z) ~ 0.382 ~a,;: the first two digits x andy. Let z =lOx+ y. If
""z ,; ; ~8 then the person who was allocated the (b) ~sing the random number 6143 take a
z ~ <1>- 1(0.382) number lS selected. Otherwise, the person st_ngi: ra-?dom observation from the Poisson
a ocatcd the number z- 58 is selected. distnbution with parameter 4.
~-0.3
Comment on this method of selection.
x- 30 11. Using the random numbers 267 394 018 take a
--~-0.3 r~nd?m ~ample of size 3 from the normal
2 dJstnbutwn with mean 35 and variance 9.
X~ 30-0.6 -0.3 0
~29.4 29.4 30
I i

15. You are given the random number 431. Use this
12. Using the random numbers 2654 9342, make The mean and variance of the sampling distribution of means
two random observations from each of the number to obtain a sample observation from
following distributions: (a) a binomial distribution with n = 12 and
p ~ 0.4.
It Is possible to work out the mean and . .
(a) The number of seeds that germinate in a expectation algebra. vanance of this sampling distribution using
group of 5 selected at random, given that (b) a normal distribution with mean 6.2 and
75% are expected to germinate. standard deviation 0.1.
(b) The number of goals in a football match, You are expected to explain clearly how you
~o:sider da population X in which E(X) = /1 and Var(X) = az
where the number of goals follows a Poisson obtain the sample observations. (0) a e n m ependent observations X 1' X 2'
. ... ,
X f
m rom X.
.
distribution with variance 2.4.
(c) The mass of a bag of sugar, where the mass
Smce E(X) = f.l,
16 The digits 8453276 are obtained from a table of
is normally distributed with mean 1010 g random digits. Use them to obtain a random E(X1J =f.l, E(X)
2 =f.l, ... , E (Xn)=f.l
and standard deviation 4.5 g. observation from each of the following
distributions:
Since Var(X) = az,
13. Using the random number 256 construct a
(a) the number of the winning ticket in a lottery
Var(X 1) = a 2 , Var(X)- 2
2 -a, ... , V ar(Xn)==a 2
random observation of the continuous random
in which there are 500 ticket numbers from The sample mean
variable X where 1 to 500 and every ticket has the same '
F(x)=~x 2 , O<x<3. chance of being selected.
(b) the number of babies born in a cottage
14. Take 20 samples, each of size 2, from the hospital in a week, assuming that on average
following distribution: one baby is born every three days and that
births are independent (and ignoring the
X 1 2 3 4 possibility of multiple births). (0)

10 15 25 35
f
Calculate the mean of each sample and find the
mean and variance of the sample means. Find the
mean and variance of the original distribution.
Comment.
a/

SAMPLE STATISTICS
When you are trying to find out information about a population it seems sensible to take
random samples and then consider the values obtained from them. It is therefore useful to
know how these sample values are distributed.

THE DISTRIBUTION OF THE SAMPLE MEAN


Imagine carrying out the following procedure: ''
; ir

111 Take a random sample of n independent observations from a population. Note that from a 'LOlJW tr·r',r)(

finite population, sampling should be with replacement to ensure that the observations are
independent.
<> Calculate the mean of these n sample values. This is known as the sample mean.
® Now repeat the procedure until you have taken all possible samples of size n, calculating
the sample mean of each one.
111 Form a distribution of all the sample means.
The distribution that would be formed is called the sampling distribution of means. n

/1 and
The standard deviation of the sampling distribution is
/a' a
~ -;;' usually written ~. This is
Solution 9.13
X is the mass, in kilograms, of a male student at the call
known as the standard error of the mean. and X~ N(/l, a2), with I"~ 70 and a~ 5. ege,

The mean of the sampling distribution is the same as the mean of the population. . a1so normal and
Since the distribution of X is normal ' the distribution of X- IS

X~ N(fl, : withfl~70, a 2 ~25, n~4.


The standard deviation of the sampling distribution is much smaller than that of the 2

population since a 2 has been divided by n. This implies that the sample means are much more )
clustered around f' than the population values are. In fact, the larger the sample size, the more
clustered they are. . X~
1.e. - N ( 70, 25)
The following diagrams help to illustrate the shape of the sampling distribution of means 4
resulting from different sized samples from given populations. so X ~ N(70, 6.25)

(a) The distribution of X when the population of X is normal


P(X < 65) ~ r(z < 65- 70)
..J6.25
~(Z < -2)
Distriubtion of X when X ~ N (1 00, 64)
~ 1- <I>(2)

~---
~ 1-0.9772 x. 65 70
z,
~+-'---------'=-~
~ 0.0228 -2 0

100 The probability that the mean mass is less than 65 kg is 0.023 (2 s.f.).
The diagram below shows the distributions of X and X drawn to scale.
Distribution of X when n ~ 2, 5 and 25.

100
!\
I
100
Means of samples Means of samples Example 9.14
Means of samples
of size 5 of size 25
of size 2
The distribution of the random variable X is N (25 340) Th f
From the diagrams, you can see that if samples are taken from a normal population, the size n drawn from this distribution is X Find th '1 . e mean o a random sample of
iven that P(X- > 28) . . 1 . eva ue of n, correct to two significant figures
sampling distribution of means is normal for any sample size. g Is approxrmate y 0.005 . (C)'

If X- then X - Solution 9.14


n
X~ N(25, 340)
Example 9.13
At a college the masses of the male students can be modelled by a normal distribution with For samples of size n, X~ N( 25, 3: 0 )
mean mass 70 kg and standard deviation 5 kg.
P(X>28)~P(Z> ~)
Four male students are chosen at random. Find the probability that their mean mass is less 2
..
than 65 kg.

~r(z > 3 ~)
..J340
You are given that P(X > 28) ~ 0.005, (ii) Distribution of X when X- Po (4)

so r(z > ~:0 )~o.oo5


p

0.2
.. 3--r,; ) ~ 1-0.005 ~ 0.995
p z < "1/340
(
3--r,;
- - ~ q,-'(0.995)
0.005 0.1

"1/340 K.
~ 2.576 x, 25 28
3--f,; ~ 2.576 X "1/340 z, 0 2.576 0 2 3 4 5 6 7 8 9 10 X

Squaring both sides


9n ~ 2.576 2 x 340 Distribution of X for samples of size 10, 15 and 30
n ~ 250.68 ...
so n ~ 250 (2 s.f.).

(b) The distribution of X when X is not normally distributed


6
The following diagrams illustrate the distribution of X for samples of different sizes taken Means of samples
from a population X: of size 10
Means of samples
of size 15
(i) Distribution of X when X- B (10, 0.25)
p 0.4. Means of samples
of size 30
0.3

(iii) Distribution of X when X- R (3, 7)


0.2
p

0.1

2 3 4
I
5
"
6 7 8 9 10 X
0.25
0

Distribution of X for samples of size 10, 15 and 30


X
0 2 3 4 5 6 7

Distribution of X for samples of size 10, 15 and 30

4 4

Means of samples Means of samples


of size 10 of size 15
Means of samples
of size 30 6
Means of samples
of size 10 Means of samples
of size 15 4 6
Means of samples
of size 30
442

By the central limit theorem, since n is large, X is approximately normal and


CENTRAL LIMIT THEOREM
From the diagrams you can see that when samples are taken from a population that is not X- NV'' :')with n = 30
normally distributed, the sampling distribution takes on the characteristic normal shape as the
sample size increases. For large n the distribution of tbe sample mean is approximately . X- - N ( 4.5 -
I.e. 2.25)
-
, 30
normal.
This result is known as the central limit theorem. It is somewhat surprising, since it holds X- N(4.5,0.075)
when the population of X is discrete (as in the binomial and Poisson distributions) and when
5 45
X is continuous (as in the uniform distribution). P(X>5)=P(Z> - · )
>/0.075
I<' or taken from a = P(Z > 1.826)
the central htnit rheore.m, X is appn.lXJtnlltcl
= 1-0.9660 X: 4.5 5
= 0.034 (2 s.£.) z, 0 1.826
and X-
(c) When X is uniformly distributed and a.;; x.;; b,
(11> )()
E(X) =!(a+ b) and Var(X) = f,(b- a) 2 (see page 363)
Since X is valid for 2 <;; x <;; 7, a= 2 and b = 7
Example 9.15
Thirty random observations are taken from each of the following distributions and tbe sample
,, = E(X) = tcz + 7) = 4.5 a 2 = Var(X) = f,(7- 2)2 = H

By :e::~:1 ~li::~:::n:,
0
mean calculated. Find, in each case, the probability that the sample mean exceeds 5. since n is large, X is approximately normal and
(a) X is the number of telephone calls made in an evening to a counselling service, where
X- Po(4.5).
(b) X is the number of heads obtained when an unbiased coin is tossed nine times.
(c) X is distributed uniformly throughout the range 2 <;; x <;; 7. i.e. X- N(4.5, ~)
X- N(4.5,0.0694 ... )
Solution 9.15
5 45
(a) X- Po(4.5) P(X > 5)=P(z > - · )
2 >/0.0694 ...
/t=!c=4.5, a =!c=4.5 '""""" 2''''•
:: t~ ~e:~~~ ~)~i::o~:m;;ince n is large, X is approximately normal, = P(Z > 1.897)
= 1-0.9711 ~ ~5 5
= 0.029 (2 s.f.) Z: 0 1.897

- - N ( 4.5, 3o
i.e. X 4.5)

X- N(4.5, 0.15) Exercise 9c The distribution sanl mean.

- ( 5-4.5) fr'ocn a ncn -no:Tna'


P(X>5)=P Z> .~
~0.15
1. T_he :oiumes of wine in bottles are normally (a) ~random sample containing 50 sunflowers
= P(Z > 1.291) dtstnbuted with a mean of 758 ml and a ts taken and the mean height calculated.
= 1- <1'>(1.291) standard deviation of 12 ml. A random sample What is the probability that the sample
X: 4.5 5
of 10 bottles is taken and the mean volume mean lies between 195 em and 205 em?
= 1-0.9017 Z: 0 1.291 found. (b) A hundred such samples, each of 50
= 0.098 (2 s.f.) Calculate the probability that the sample mean is observations, are taken. In how many of
less than 750 ml. these would you expect the sample mean to
(b) X- B(9, 0.5) be greater than 210 em?
f...t = np = 9 X 0.5 = 4.5 {ov.. i :H'C /~%) 2. The heights of a new variety of sunflower can be
modelled by a normal distribution with mean
a2 = npq = 9 x 0.5 x 0.5 = 2.25 2 m and standard deviation of 40 em.
![·:" iii T
I 445

3. In an examination taken by a large number of 11. The mean of a sample of 100 observations of the THE DISTRIBUTION OF THE SAMPLE PROPORTION, p
students the mean mark was 64.5 and the random variable X is denoted by X. The mean of
variance was 64. The mean mark in a random X is 20 and the standard deviation of X is 0.3.
sample of 100 scripts is denoted by X. Find Find the mean and the standard deviation of X. Suppose a random sample of n observations is taken from a population in which the
(a) P(X > 65.5) proportiOn of successes 1s p and the proportion of failures is q = 1 _ p.
12. A sample of 11 independent observations is taken
(b) P(63.8 <X< 64.5) from a normal population with mean 74 and If X is the number of successes in the sample, then X follows a binomial distribution i.e.
4. The mean of 50 observations of X, where
standard deviation 6. The sample mean is X- B(n, p) and E(X) ~ np, Var(X) = nqp (see page 286).
denoted by X.
X- B(12, 0.4), is X.
(a) Find n if P(X > 75) ~ 0.282. The random variable for the proportion of success in the sample is X.
(a) State the approximate distribution of X. (b) Find n if P(X < 70.4) ~ 0.0037.
(b) Hence find P(X < 5) n

5. A normal variable X has standard deviation a.


13. To estimate the mean and standard deviation of This can be written P where P = X~~ X
the life of a certain brand of car tyre a large · ' ' n n
The mean of 20 independent observations of X number of random samples of size 50 were
is X. tested. The mean and standard error of the It is possible to work out the mean and the variance of P, using expectation algebra as follows:
{a) Given that Var(X) = 3.2, find the value of a. sampling distribution obtained were 20 500 km
{b) Would your answer be different if the and 250 km respectively. Estimate the mean and
E(P,) = E(~ X) Var(P,) = Var(~ X)
r
variable was not normal? standard deviation of the life of this brand of car
tyre. Explain what part the use of the central

= ~ E(X) = (~
6. Independent observations are taken from a limit theorem has played in the calculations.
normal distribution with mean 30 and x Var(X)
variance 5. 14. The diameters, x, of 110 steel rods were
(a) Find the probability that the average of 10 measured in centimetres and the results were 1 1
observations exceeds 30.5. summarised as follows: ~;; x np = n2 x npq
(b) Find the probability that the average of 40 2:x~36.5, pq
observations exceeds 30.5.
(c) Find the probability that the average of 100 Find the mean and standard deviation of these n
observations exceeds 30.5. measurements.
(d) Find the least value of n such that the Assuming these measurements are a sample from The distribution of P, has mean p and variance /Jq .
probability that the average of n a normal distribution with this mean and this n \S.d.
observations exceeds 30.5 is less than 1%. variance, find the probability that the mean \
\
diameter of a sample of size 110 is greater than \
I \
7. The standard deviation of the masses of articles 0.345 em. (0 & C) /
r: j j_!
in a large population is 4.55 kg. Random /
samples of size 100 are drawn from the 15. In a certain nation, men have heights distributed
population. Find the probability that a sample normally with mean 1. 70 m and standard
deviation 10 em. Find the probability that a man p
mean will differ from the population mean by
less than 0.8 kg. chosen randomly has height not less than
1.83 m. The distribution of P, is known as the sampling distribution of proportions. The standard
What is the probability that the average height of
8. The variable X is such that X- N(p, 4).
A random sample of size 11 is taken from the 3 men chosen randomly is greater than 1. 78 m
. t"IOn o f t h"IS d"tstn"b utwn
d ev1a . IS (Pq an d it is known as the standard er~or of proportion.
. ~------;;
population. Find the least 11 such that and the probability that all three will have
P(IX-~<1< 0.5) > 0.95. heights greater than 1.83 m? (MEl)
NOTE:_ When considering ;he normal approximation to the binomial distribution, a
9. {a) A large number of random samples of size n 16. Two red balls and 2 white balls are placed in a contmmty correction of ± 2 is needed (see page 383).
are taken from B(20, 0.2). Approximately 90% bag. Balls are drawn one by one, at random and
of the sample means are less than 4.354.
Estimate 11.
without replacement. The random variable X is
the number of white balls drawn before the first
Since P, ~ ~ x X, use a continuity correction~ x ( ± ~)i.e. 1
±-
2n
(b) A large number of random samples of size 11 red ball is drawn.
are taken from Po(2.9). Approximately 1% (a) Show that P(X = 1) == j-, and find the rest of
of the sample means are greater than 3.41. the probability distribution of X. Example 9.16
Estimate 11. (b) Find E(X) and show that Var(X) ~ j.
(c) The sample mean for 80 indepen~ent It is known that 3% of frozen pies delivered to a canteen are broken. What is the probability
10. The random variable X has standard deviation a. observations of X is denoted by X. Using a
The mean of 40 observations of X is X. Given that, on a mormng when 500 pies are delivered, 5% or more are broken?
suitable approximation, find P(X > 0.75).
that Var(X) = 0.625, find the value of a. (C)
Solution 9.16

Let P be the probability that a pie is broken, sop~ 0.03.


Let P, be the proportion of pies in the sample that are broken.
Then P, _ N(p. pnq) with p ~ 0.03, q ~ 0.97, n ~ 500. 9d [)
bution of sample p1oportions (la1ge samples)
1. 2% of the trees in a plantation are known to
0.03 X 0.97) (a) Find the probability that a poll of 100
have a certain disease. A random sample of 300
P, - N 0.03, trees is checked. Find the probability that the
randomly selected voters would show over
( 500 SO% in favour of Mr Hand.
proportion of diseased trees in the sample is
(b) Find the corresponding probability if the
i.e. P - N(0.03, 0.000 058 2) {a) less than 1%, sample consists of 1000 randomly selected
' (b) more than 4%. voters.
To find the probability that 5% or more are broken,
2. A fair coin is tossed 150 times. Find the 5. Three-quarters of the households in a particular

~OO)
probability that area are connected to the internet. Find the
find P(P,) 0.05)---+ P(P, > 0.05 X ip><ocim>UY co'''''""') probability that at least 73 of a random sample
2 (a) fewer than 40% of the tosses will result in
heads, of 100 households arc connected to the internet.
~ P(P, > 0.049) (b) between 40% and 50% (inclusive) of the 6. A die is biased so that 1 in 5 throws results in a
0.049-0.03) tosses will result in heads, six. Find the probability that, when the die is
- p z > -::;;~==="=' (c) at least 55% of the tosses will result in thrown 300 times, the number that result in a six
- ( '>/o.ooo 058 2 heads.
{a} is more than 70,
~ P(Z > 2.491) (b) is at least 70,
3. A fair coin is tossed 300 times.
~ 1- <1>(2.491) {c) is less than 57.
Work through part (c) as in question 2.
~ 1-0.9936 0.03 0.049 Explain why your answer is different from that 7. 70% of the strawberry plants of a particular
0 2.491 obtained in question 2.
~ 0.0064 variety produce more than ten stra wherries per
plant. Find the probability that a random sample
4. Mr Hand gained 48% of the votes in the District of 50 plants of this variety consists of more
Council elections. than 37 plants which produce more than ten
Alternative method for Solution 9.16 strawberries per plant.
Instead of considering p, the proportion of broken pies, you could consider X, the number of
broken pies in the sample.
In this case, X- B(n, p) with n ~ 500, P ~ 0.03, q ~ 1- P ~ 0.97.
UNBIASED ESTIMATES OF POPUlATION PARAMETERS
Now np ~ 500 x 0.03 ~ 15 and nq ~ 500 x 0.97 ~ 485. . . In order to define a binomial distribution you need to know n and P; to define a Poisson
distribution
2
you need to know.< and to define a normal distribution you need to know f'
Since n is large such that np > 5 and nq > 5, use the normal approximation for the bmormal and a • These are known as the population parameters of the distributions.
distribution (see page 382), 14 55
where X- N(np, npq) with np ~ 15 and npq ~ 500 x 0.03 x 0.97 ~ · Suppose that you do not know the value of a particular parameter of a distribution, for
1.e. X- N(15, 14.55). example the mean or the variance or the proportion of successes. It seems sensible that you
would take a random sample from the distribution and use it in some. way to make an
You want the probability that 5% or more are broken. estimate of the value of your unknown parameter.
5% of 500 ~ 25, so find the probability that 25 or more are broken.
This estimate is unbiased if the average (or expectation) of a large number of values taken in
P(X;;,:;. 25 ) ----7 P(X > 24.5) ,:continuit-y COITt~\:tim1) the same way is the true value of the para1neter. There may be several ways of obtaining an
24.5 -15) unbiased estimate but the best (most efficient) estimate is the one with the smallest variance.
~P Z > ---,=~
( '-114.55
~ P(Z > 2.491) x, 15 24.5 POINT ESTIMATES
~ 0.0064 (as above) z, o 2.49 1
If the random sample taken is of size n,
NOTE: Since the same underlying theory has been used, pro b a bT · o f t hit
1ltles s ypecan be X
"d ring
found either by considering Ps, the distribution of sample proportw~s, or by ~on;.lo~ to the' ® the best unbiased estimate of p, the proportion of successes in the population, is f3 where
the distribution of the number of successes, and applying the norma approx1ma 1 f)"'-- /J, (J, is the proponio:l '~'_[ ~LL('\'C~~(·~ ir1rk- sarnpk
binomial distribution. In either case, the sample s1ze, n, must be large.
® the best unbiased estimate of fl, the population mean, is tl where
. b oth cases, or omt.tted in both cases, the
Note that if the continuity correction is used rn
;?_ ::\'
standardized z values will agree exactly.
II

® the best unbiased estimate of a 2 , the population variance, is 8 2 where


---------------------~·r-'

Try this using the raw data and your calculator in SD mode (see page 40). Input the data as
follows:
There are alternative formats for 82:
Casio 570W/85W/85WA
11

ll ~·· l 11
I, (x
n I
Set SD mode ~IM=o=n=EIIMODEIIIJ or IMODEI m
Clear memories ISHIFTIIScJI G
or Input data Q2J IDTI
1 [,
n--'1 11 n n-..-J\'""'"' n
ITJinTI
QJ IDTI
. d it is ossible to find the value of
NOTE· that if you are using your calculator m SD mo e, p . db .
. d l x l 0 some models this is obtame y pressmg
(j directly. Look for a key marke ~. n
ISHIFTII.II. Find the key on your model. To obtain

d 2 = 114.961 ... ISHIFT I QJ [Z] G


Example 9.17 · 1 x = 14.76 ISHIFT I [I] G
. . . , rne s and records the number of mmutes, x, to t 1e
A railway enthusiast simulates tram JOU y h hedule being used. A random sample of You can also check
nearest minute, trains are late ac~ordmg to t e sc
50 journeys gave the following Urnes. LX= 738
17 5 3 10 4 3 10 5 2 14 L:x 2 = 16 526
3 14 5 5 21 9 22 36 14 34 n=50
22 4 23 6 8 15 41 23 13 7 To clear SD mode
6 13 33 8 5 34 26 17 8 43
24 14 23 4 19 5 23 13 12 10 . .
. 2 = 16 526 calculate to two decimal places, unb!ased estimates of
G1ven that LX= 738 and LX f
the mean and the variance of the popu anon rom
l .
which this sample was drawn. (L) Example 9.18
For the data given in Example 9.17, estimate the proportion of trains that are more than
Solution 9.17 25 minutes late.
X is the number of minutes that the train is late.
2 Solution 9.18
Let E(X) =!'and Var(X) = rr ·
Unbiased estimate of I' Number in sa1nple that are more than 25 minutes late.= 7
Proportion in sample, ps = :fd = 0.14
LX 738
, =X =-;;=5o= 14.76 Unbiased estimate of population proportion, p, is fJ, where fJ = p, = 0.14.
2
Unbiased estimate of a
sz=_1_(Lxz- (Lx)z) INTERVAL ESTIMATES
n-1 n
2 Another way of using a sample value to give a good idea of an unknown population
1 ( (738) )
= 49 16 526-50 parameter is to construct an interval, known as a confidence interval.
In general terms, this is an interval that has a specified probability of including the parameter.
= 114.961. ..
The interval is usually written (a, b) and the end~ values, a and b, are known as confidence
= 114.96 (2 d.p.) limits. The probabilities most often used in confidence intervals are 90%, 95% and 99%.
Suppose you do not know the mean f..t of a particular population and you want to work out a
95% confidence interval for it. You would need to construct an interval (a, b) so that
P(a <1< < b)=0.95.
In this case, the probability that the interval includes I' is 0.95 or 95%.
The interval that you construct uses the value of the mean of a random sample of size n taken
from the population. This mean is denoted hy x.
T
I Writing these two inequalities in one statement gives
- a - a
X-196 .r <p. < X+1.96-
Before constructing your interyal for p, it is essential to ask the following questions. vn >In
- Is the distribution of the population normal or not? Therefore the probability statement is
- Do you know the variance of the population?
- Is the sample size large or small? P(X-1.96 ~ <1< < X+1.96 ~)=0.95
Your answers will then determine how to proceed. The following theory illustrates various
situations. This enables you to construct the 95% confidence interval for p..
Comparing this with P(a <p. <b)= 0.95, if the mean obtained from your sample is x, then the
end-values, or confidence limits are
(a) Confidence interval for fl, the population mean a a
.. of a normal population, x-1.96 .r and x+l.96-.
vn >In
e with known variance o 2
" using any size sample, n large or small These are sometimes written X ± 1.96 ~.

Consider first how to calculate the end-values of the most commonly used interval, the H :t i'> the mean of a randorn of any size II taken from a normal population \'i'ith
95% confidence interval. The method can then be adapted for other levels of confidence. km_nvn variance
Note that it is useful to be able to follow the theory for the derivation of the end-points, but in then a 9S<;r;) confidence interval for ,u 1s
practice you will probably only need to be able to apply the formula.
As you saw on page 438, for random samples of size n,

if X- N(lt, a 2 ), then X- N0, :) Example 9.19

x-~~
The mass of vitamin E in a capsule manufactured by a certain drug company is normally
Standardising, Z =----:r, where Z- N(O, 1) distributed with standard deviation 0.042 mg. A random sample of five capsules was analysed
a/vn and the mean mass of vitamin E was found to be 5.12 mg. Calculate a symmetric 95%
Consider the distribution of Z. confidence interval for the population mean mass of vitamin E per capsule. Give the values of
For a 9 5% confidence interval you need to find the values of z between which the central the end-points of the interval correct to three significant figures. (C)
95% of the distribution lies. This means that the upper tail probability is 0.025 aud the lower
tail is 0.975. Solution 9.19
P(Z < z) = 0.975 X is the mass, in milligrams of a vitamin E capsule.
z = <1>- 1(0.975) 95% X- N(p., a 2 ) with a= 0.042.
= 1.96
The values of z are ±1.96. \ X-
- ( "')
N p.,-;; with n = 5.

2.5%
So P(-1.96 < Z < 1.96) = 0.95
The 95% confidence interval for I' is ( x -1.96 ~, x + 1.96 ~}
i.e. P( -1.96 < ~/-:£, < 1.96) = 0.95 -1.96 0 1.96

_ a 0.042
Now consider the inequality in two parts: x ± 1.96 .,-=5.12 ± 1.96x--
vn 15
X -p. x-~~
-1.96 < - - - - < 1.96 = 5.12 ± 0.0368 ...
a a
i
>In >In
a - - a
-1.96 >fn <X -~t X-p. < 1.96 >fn Lower confidence limit= 5.12-0.0368 ... = 5.08 (3 s.f.)
Upper confidence limit= 5.12 + 0.0368 ... = 5.16 (3 s.f.)
- a - a
lt<X+1.96>fn X-1.96- <I' So the 95% confidence interval for p, based on the sample mean, is (5.08 mg, 5.16 mg).
>In
NOTE: The probability that the interval (5.08 mg, 5.16 mg) includes, or has trapped, I" is
T In a 95% confidence interval,
0.95, i.e. 95%. If you took another random sample of the same size, you would probably get
a different interval. If you took lots of samples in a similar way then, on average, 95% of
I the upper tail probability is 0.025
so the lower tail probability is 0.975. I
/Y'\/.95%
IlL\

these intervals would include the true population mean I"· P(Z < z) ~ 0.975 ii
l I
I
\
<
/( \
I.e. <P(z) ~ 0.975
z ~ <P-'(0.975)
2...·.5
.
>:·]/. .
.... :I
[~<.
\. 2.5%

The following computer simulation illustrates the intervals obtained when 100 confidence
-1.96 0 1.96
intervals are constructed, each with 95o/o confidence. On average, 5% do not include fl· ~ 1.96
In practice, you would only construct one interval. Remember that there is a 5% chance that
In a 99% confidence interval,
your interval does not include ft. the upper tail probability is 0.005
The intervals shown in bold are the ones which do so the lower tail probability is 0.995.
not include I"· You will see that in this case just six P(Z < z) ~ 0.995
of the 100 do not include fl. On average 95% of
intervals constructed in this way will include the 1.e. <P(z) ~ 0.995
true population mean. z ~ <P- 1(0.995)
~2.576 -2.576 0 2.576

Summary

i fj ,'

,'fj

Critical z-values in confidence intervals Table of critical values


The z-value in the confidence interval is known as the critical value and is obtained for In some tables, the most commonly used critical z-values are summarised as follows:
different levels of confidence as follows:
H •
-'-! fm ;::::_h:h v:l.lnc oi 1abk \)j_ bUCh
In a 90o/o confidence interval, -l'T''>_ 90%
the upper tail probability is 0.05 ! "X p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995

\ [\)<::
so the lower tail probability is 0.95. l I

~:4
z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
P(Z < z) ~ 0.95
<P(z) ~ 0.95 - for a 90% confidence interval, P(Z < z) ~ 0.95; p ~ 0.95 gives z ~ 1.645,
1.e.
-1.645 0 1.645
z ~ <P- 1(0.95) - for a 95% confidence interval, P(Z < z) ~ 0.975; p ~ 0.975 gives z ~ 1.96,
~ 1.645 - for a 99% confidence interval, P(Z < z) ~ 0.995; p ~ 0.995 gives z ~ 2.576.
If you want a 98% confidence interval, this implies an upper tail probability of 0.01.
P(Z < z) ~ 0.99 gives z ~ 2.326.
If the p-value for the confidence interval that you require is not in this summary table, you
Will need to work from the main body of the normal distribution table.
454 /\ CC)f'i::'ISE: COU~\s~=· if< ,f:c. -LE\/f:_:__ ;;:,.6.,]'-:;r:C'~

(b) Confidence interval for p, the population mean a


(b) Subtracting:@- (j) 2 X 1.96 ~ 1.96
10
• of a non-normal population,
6~5
" with a known variance f1 2
" using a large sample, n ;> 30 say (c) A symmetric 90% confidence interval for f' is

In this case, since the sample size is large, the central limit theorem can be used. (x- 1.645 :[;;,X+ 1.645 :[;;)
N~t, :
2
X is approximately normal and X- ) (see page 442).
a 5
x± 1.645 _,~178.2 ± 1.645x-
H 5:: is the mean of a random sample of size tt, \Vherc n is \n > taken from a non- vn 10
normal ·vvith knon-·n variance ~ 178.2 ± 0.8225
then ;1 95 (% confidence inten-'al for p -is
So the 90% confidence interval~ (178.2- 0.8225, 178.2 + 0.8225)
a a
,X+ L9C~ ~ (177.38 em, 179.02 em) (2 d.p.)

Example 9.20 Example 9.21


The heights of men in a particular district are distributed with mean f' em and the standard A plant produces steel sheets whose weights are known to be normally distributed with a
deviation a em. standard deviation of 2.4 kg. A random sample of 36 sheets had a mean weight of 31.4 kg.
On the basis of the results obtained from a random sample of 100 men from the district, the Find 99% confidence limits for the population mean. (L)
95% confidence interval for f' was calculated and found to be (177.22 em, 179.18 em).
Calculate Solution 9.21
(a) the value of the sample mean,
(b) the value of a, X is the weight, in kilograms, of a steel sheet. Then X- N(ft, 2.4 2 ).
(c) a symmetric 90% confidence interval for f'· A sample of size 36 is taken, so n ~ 36 and the sample mean x ~ 31.4.
The end-values of a 99% confidence interval for f' are
Solution 9.20
Let X be the height, in centimetres, of a man in the district.
a
X± 2.576 _,- ~ 31.4 ± 2.576 x
vn
-=
2.4
v36
The distribution of X is not known, but E(X) ~ f' and Var(X) ~ a 2 •
~ 31.4 ± 1.0304
Since the sample size, n, is large (n ~ 100), using the central limit theorem, the distribution of
a2 so the 99% confidence interval is (31.4 -1.0304,31.4 + 1.0304) ~ (30.3696, 32.4304)
X is approximately normal with mean !l and variance-. ~ (30.4 kg, 32.4 kg) (3 s.f.)
n
(a) A 95% confidence interval for fL is given by
Width of a confidence interval
(
x -1.96 _'l_,
.y;;
x + 1.96 _'l_)
.y;;
with n ~ 100, so -Yn ~ 10
In Example 9.21,
r----~- .. ----~ --- --·----~

Since the interval is (177.22, 179.18)


width of the 99% confidence interval~ 2 x 2.576 x :[;;
X-2.576 :in X X+ 2.576 :in
a
x- 1.96 10 ~ 177.22 width
~ 2.0608 kg. a
a 2 X 2.576 Wi
x+1.9610~179.18 @

Adding (j) and @ 2x ~ 356.4


x~ 178.2
The sample mean is 178.2 em.
For the same data - a
x-1.96Jn X+ 1.96 WI Now suppose that, in Example 9.22, the 95% confidence interval for fl must have a width less
The width of the 9 5% confidence interval ' than l. Will the sample size n be larger or smaller than when the total width was less than 2?
(J
width
=2 X 1.96 '{;; a To answer this, look at equation@. This now becomes
2xl.96Wi
2.4 5.096
=2xl.96x.= --<1
~36
'{;;
= 1.568 kg i.e. '{;; > 5.096
n > 25.96
So the least number of tests would be 26. The sample size has increased.
h_)J' :.1. given confidcnc~'
Determination of sample size
rhc
Example 9.22
The result X of a stress test is known to be a normally distributed random variable with mean Now consider the situation as in Example 9.22, where the total width must be less than 2, but
· t'Ion 1 ·3 · It is required to have a 95% symmetncal confidence the confidence level is increased to 99%.
ll and stan d ar d d ev1a · d mterval for
I" with total width less than 2. Find the least number of tests that should be carne out to (L) Will the sample size n be larger or smaller than that required for the 95% confidence interval?
achieve this. The calculations needed to find the width are similar to those given in Solution 9.22,
equation()) but the value 1.96 will be replaced by 2.576, the z-value for the 99% confidence
interval.
Solution 9.22
2 1.3
- So interval width= 2 x 2.576 x '{;;
X- N(rt, 1.3 2 ), and for samples of size n, X- N ( p,,--;;-
1.3 )
6.6976
The 95% confidence interval for p, is '{;;

(x-1.96 ~,X+l.96 ~) Therefore


6.6976
'{;; < 2

(J '{;; > 6.6976


Interval width= 2 x 1.96 '{;; 2
n > 11.21 ...
1.3
=2 X 1.96 X '{;; ". (j) For the 99% confidence interval, the least number of tests required is 12, whereas for the
95% confidence interval it was 7, so the sample size 1nust be larger. ·
5.096
= '{;;
The width of the interval must be less than 2,

5.096 < 2
" '{;; "'@ (c) Confidence interval for ,u, the population mean
'{;; > 5.096 • of a normal or non-normal population,
2 • with unknown variance a 2
• using a large sample, n
'{;; > 2.548
n > 2.548 2 When calculating confidence intervals it is often the case that the population variance, a 2 , is
i.e. n > 6.49 ... not known. Provided that the sample size, n, is large, (n;;;;. 30 say) it is pennissible to use 8 2 ,
the best unbiased estimate for a 2 (see page 447).
The least number of tests that should be carried out is 7.
Ideally the distribution of X should be normal, but an approximate confidence interval can
also be given when the distribution of X is not normal. Remember that in both cases, n must
be large.
Solution 9.24
30
a 95~/;) conhde::m ""'"''"''for pi~
'
(a ) f..t=X 2: X 35 050
200 ~ 175.25
n
! h n
lx
!
L% X-'- 1.96 u"2 ~ --s
n-1
2

~ _n (2:---x
x' _2)
where n-1 n
n-
! ~ 200 (6 163 109
or
l II 175.25 2 )
II 199 200
~ 103.5
Example 9.23
The fuel consumption of a new model of car is being tested, In one trial, 50 cars chosen at Alternatively
random, were driven under identical conditions and the distances, x km, covered on 1 litre of
8' ~-1 (2:: x'- (2:: x)') svc p:-tgt: 44N
petrol were recorded. The results gave the following totals: n-1 n
Lx ~ 525, 2:x 2 ~ 5625, ~ 1_ (6163 109 35 050')
Calculate a 95o/o confidence interval for the mean petrol consumption, in kilometres per litre, 199 200
of cars of this type. ~ 103.5

Solution 9.23 (b) The confidence limits for 90'){o confd ,


1 ence mterval for fl are
Lx 525
x~-~-=10.5 x± L645 -"'-~ 175.25 ± L645 x fli53.5
n 50 'in -1200
u 2 is unknown, so use 8 2 where 8 2 ~ n-1
1
-- (2:: x'- (2:: x)')
n
~ 175.25 ± 1.1833
So 90% confidence interval is (175 2 5 1 1833
1 ( 525') ~ (174.07 em, 176.4J cml '( 2 d.p:)'' l75.25 + 1.1833 ,,)
~ 49 5625 -So
The central limit theorem
- has
( been
') used t o give
, an approximate
, distribution for X the
~ 2.2959 .,
8~ L515 '" smnple mean where X N p, :!__ . ~ '
n
95% confidence limits for I' are ~='-

8 LS15 ,,
x±L96.,-~10.5±L96x
~n
-=
-~so
~ 10,5 ± 0.42
95% confidence interval for fl ~ (10.08 km/litre, 10.92 km/litre)

Example 9.24
The height, x em, of each man in a random sample of 200 men living in the UK was
measured, The following results were obtained:
Lx ~ 35 050, 2:x 2 ~ 6 163 109,
(a) Calculate unbiased estimates of the mean and variance of the heights of men living in
the UIC
(b) Determine an approximate 90% confidence interval for the mean height of men living in
the UK. Name the theorem that you have assumed. (NEAB)
-:-·······i>lil/'·.····,·,,·,,l>l•········---------------....t~~··....------------------------·······.·~i·.·.,·. ·-,,,·.·I!·CJ·i'i·4·6·1·---...
(b) Forty random samples of 36light bulbs are 15. The age, X, in years at last birthday, of 250
taken and a 90% confidence interval for fl is mothers when their first child was born is given
calculated for each sample. Find the
on expected number of intervals that contain It·
in the following table:
6. A random sample of 6 items taken from a
1. The concentrations, in milligrams per litre, of a X No. of mothers
normal population with mean p and variance 12. An efficiency expert wishes to determine the
trace element in 7 randomly chosen samples of
4.5 cm 2 gave the following data: mean time taken to drill a number of holes in a 18- 14
water from a spring were
Sample values: 12.9 em, 13.2 em, 14.6 em, metal sheet. Determine how large a random
240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5. 20- 36
12.6 em, 11..3 em, 10.1 em. sample is needed so that the expert can be 95%
22- 42
Determine the unbiased estimates of the mean (a) Find the 95% confiden~e inte_rval fo~ p. certain that the sample mean will differ from the
and the variance of the concentration of the trace (b) What is the width of thts conftdence mterval? true mean time by less than 15 seconds. Assume 24- 57
element per litre of water from the spring. (L) that it is known from previous studies that the 26- 48
7. A factory produces cans of meat whose masses population standard deviation is 40 seconds. (L) 28- 26
2. Find the best unbiased estimates of the mean {t are normally distributed with standard deviation
and variance a 2 of the population from which 13. A random sample of 60 loaves is taken from a 30- 17
18 g. A random sample of 25 cans is found to
each of the following samples is drawn. It is a have a mean mass of 458 g. population whose masses are normally 32- 7
good idea to do parts (a) to (c) both with and distributed with mean p and standard deviation 34- 2
(a) Obtain the 99% confidence interval for the 10 g.
without a calculator. 36- 0
population mean mass of a can of meat
(a) 46, 48, 51, 50, 45, 53, so, 48 produced at the factory. (a) Calculate the width of a symmetric 38- 1
1.687, 1.688, 1.689, (b) Explain what the interval means. 95% confidence interval for p based on this
(b) 1.684, 1.691, sample.
1.688, 1.690, 1.693, 1.685 (c) Would the interval be wider if a 90% (The notation implies that, for example in row 1,
confidence interval was calculated? (b) Find the confidence level of a symmetric there are 14 mothers for whom the continuous
(c) 22 23 24 25 Explain your reasoning. 95% confidence interval having the same variable X satisfies 18 <X< 20.)
X 20 21
width as before but based on a random Calculate, to the nearest 0.1 of a year, estimates
14 17 26 20 9 8. A random sample of 100 observations from a sample of 40 loaves. of the mean and the standard deviation of X.
f 4
normal population with mean It gave the
following data: LX~ 8200, Lx 2 ~ 686 800. 14: The distribution of measurements of thicknesses If the 250 mothers are a random sample from a
(d) L;< ~ 120, L;< 2 ~ 2 102, n~8 large population of mothers, find 95%
of a random sample of yarns produced in a
(e) L;< ~ 100, L;< 2 ~ 1 028,n ~ 10 (a) Find a 9 8% confidence interval for fl· textile mill is shown in the following table. confidence limits for the mean age, p, of the total
(b) Find a 99% confidence interval for ll. population. (C)
n ~ 34, LX~ 330, LX ~ 23 700
2
(f) (c) Would your answers have been different if
Yarn thickness in microns
the population was not normal? 16. The lifetimes of 200 electrical components were
3. A measuring rule was used to measure the ~ength Explain your answer. (mid-interval values) Frequency recorded to the nearest hour and classified in the
of a rod of stated length 1 m. On 8 successive frequency tabulation.
occasions the following results, in millimetres, 9. Eighty employees at an insurance company were 72.5 6
were obtained. asked to measure their pulse rates when they 77.5 18 Lifetime Frequency Lifetime Frequency
1000, 999, 999, 1002, 1001, 1000, 1002, 1001. woke up in the morning. The researcher then 82.5 32
calculated the mean and the standard deviation 0- 80 600- 4
87.5 57
Calculate unbiased estimates of the mean and, to of the sample and found these to be 69 be~ts and 100- 48 700- 3
two significant figures, the variance of the errors 4 beats respectively. Calculate a 97% confidence 92.5 102
occurring when the rule is used for measuring a 97.5 200:_ 30 800- 2
interval for the mean pulse rate of all the 51
1 m length. (L) employees at the company, stating any 300- 18 900- 0
102.5 25
assumptions that you have made. 400- 10 1000- 0
4. Cartons of orange are filled by a machine. A 107.5 9
sample of 10 cartons selected at random from 500- 5
the production contained the following
10. One hundred and fifty bags of flour are taken Illustrate these data on a histogram.
from a production line and found to have a mean
quantities (in millilitres) mass of 748 g and standard deviation of 3.6 g. Estimate, to two decimal places, the mean and Draw a histogram of the data and estimate the
standard deviation of yarn thickness. Hence mean and standard deviation of the distribution.
201.2 205.0 209.1 202.3 204.6 (a) Calculate an unbiased estimate of the estimate the standard error of the mean to two Calculate a symmetric 90% confidence interval
206.4 210.1 201.9 203.7 207.3 standard deviation of a bag of flour decimal places, and use it to determine for the population means, using a suitable
Calculate unbiased estimates of the mean and produced on this production _line. . approximate symmetric 95% confidence limits normal approximation for the distribution of the
variance of the population from which the (b) Calculate a 98% confidence mterval fot the giving your answer to one decimal place. (Min) sample mean. (MEl)
sample was taken. (L) mean mass of a bag of flour produced on
this production line.
5. A certain type of tennis ball is known to have a (c) State any assumptions you have made.
height of bounce which is normally distributed
with standard deviation 2 em. A sample of 60 11. (a) A 95% confidence interval for the mfel~nl
. of a parttcu
length of hfe . lar. b ran d 0 tglt
tennis balls is tested and the mean height of
bounce of the sample is 140 em. bulb was calculated and the confi~e~cl~ottrs.
limits were 1023.3 hours and 11° · fa
(a) Find a 95% confidence interval for the mean
height of bounce of this type _of tennis ~all.
The interval was based on the rest~~s :/the
random sample of 36 light bulbs. tn
(b) State any assumptions made m calculatmg . . lf
99% conftdence mterva or fl,_
the mean
b lb
your interval. length of life of this brand of !tght u '
(d) Confidence interval for p when For example, for a sample of size 8,
T follows a !-distribution with 7 degrees of freedom. You would write T- t(7).
'" the population is normal
• a 2 is unknown, The 95% confidence interval for 1-' is obtained as follows:
" sample size n is small, H
2
and s are the mean and ,.-ariancc of ;-1 smal_l f'rom a normal
When calculating confidence intervals, you have already encountered the situation when large \Vith unkmnvn mean p and unkJw\vn vari~m 1..:e
samples (n;;. 30) are taken from a normal population with unknown variance a •
2 then a 9S (>;) confidence interval for p i~
i f1 CJ !I
For large samples, (X--- t 'I .k _,_ t
X.-,, 11

(jj{;; = Z where Z - N(O, 1)


and tis the value from a t) disrrilnnion .'>uch that
But if the sample size is small (n < 30), ~ ~/': no longer has a normal distribution. C. -t_, t) cndo:..cs 95%) of the tin ·1) dJStrilnn!
a/m
For small samples, To find the required value of t, known as the critical value, you will need to use !-distribution
tables. These g1ve the t-value such that P(T.;; t) = p, for various values of 1-'· The tables are
X.-1-l pnnted on page 650 and an extract is reproduced here.
(jj{;; = T where T has a !-distribution.

Before looking at confidence intervals f' when the sample size is small, consider further the r~ P(Td)~p
!-distribution.
: \
I
THE t-DISTRIBUTION I
~'

0
The distribution ofT is a member of a family of !-distributions. All !-distributions are
symmetric about zero and have a single parameter v (pronounced new) which is a positive (! {)_ 0 ..90 0 . 95 (}_ ')'?5 !),9)! ()_ 995 o.(J975 D._'Jl)(;~ 0.~1')()5
integer.
II - 1 1.000 3.078 6.314 12.71 31.82
v is known as the number of degrees of freedom of the distribution and if, for example, T has 63.66 127.3 318.3 636.6
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33
a !-distribution with five degrees of freedom, you would write T- t(5). 31.60
0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21
The diagram below shows two curves, t(2) and t(5). 12.92
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598
Note that as v increases, the corresponding t(v) curve rese1nbles the standardised normal 7.173 8.610
distribution N(O, 1). In fact when v;;. 30, the difference between the t(v) distribution and the 5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869
normal distribution is negligible. G 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
For samples of size n, it can be shown that 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355
Standardised normal curve 3.833 4.501 5.041
-., N(O, 1) 9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
'/ [() 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144
v = 10 4.587
!j 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
v=2 ..
3() 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
GO 0.679 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
120 0.677 1.289
I 1.658 1.980 2.358 2.617 2.860 3.160 3.373
0.674 ( 1.282 1.645 1.960 2.326 2.576 2.807 3.090
-3 -2 -1 0 2 3 3.291

~~:will see ~ha~ as'!' increases, the corresponding t-distribution becomes more and more like
T= X~/': follows a !-distribution with (n -1) degrees of freedom. d' .~r~al d1st~Ibutwn. Compare the last row, v == oo, with the critical values for the normal
Istn utwn, prmted on page 649.
fJ/m
T
I
You will find that you use the t-distribution tables in a slightly different way from the normal ' (iii) ',':ou need P(l Tl < 2.201) i.e. P(-2. 201 < T < 2 _2011
tables, so you need to ensure that you can use them correctly.
Fmd row 11, go across to 2.201 which is in column 0.975.
In this extract, the highlighted values are referred to in tbe text.
So P(T < 2.201) ~ 0.975
i),95 nS7S n. 0.995 U.'!:J7'i 11.Y')':j (1,99 }s P(T> 2.201) ~ 1-0.975 ~ 0.025
iL7.\

6.314 12.71 31.82 63.66 127.3 318.3 636.6 It follows that


1.000 3.078
2.920 4.303 6.965 9.925 14.09 22.33 31.60
0.816 1.886 P(T < -2.201) ~ 0.025 also
0.95
4.144 4.587 . . P(-2.201 < T < 2.201) ~ 1- 0.025- 0.025
1.372 1.812 2.228 2.764 3.169 3.581
10 0.700 ~ 0.95
0.697 1.363 11.7961 12.2011 2.718 13.106 I 3.497 4.025 4.437
(a) jj ~95%
1.782 2.179 2.681 3.055 3.428 3.930 4.318
\l 0.695 1.356
2.160 2.650 3.012 3.372 3.852 4.221 The t-values that enclose the central95% are ±2.201.
u 0.694 1.350 1.771 -2.201 0 2.201
2.977 3.326 3.787 4.140
(b) 14 0.692 11.345 I 1.761 2.145 12.6241

Example 9.25(b)
Example 9.25(a)
The random variable T has a t-distribution with 14 degrees of freedom i e T _ t( 141
Consider T following a t-distribution with 11 degrees of freedom, i.e. v ~ 11 and T- t(11 ). Fmd the value oft for which ' · · · ·
Find (i) P(T < 1. 796) (i) P(T < t) ~ 0.90 (ii) P( I T I < t) = 0.98
(ii) P(T>3.106)
:\
f/ : \[K._
(iii) P(i Tl < 2.201) Solution 9.25(b)
1 'vo.9s
(i) v ~ 14, row 14, column 0.90 gives t= 1.345. I .C\
Solution 9.25(a) (ii) Th e require d value for t corresponds to an upper tail 0.01 0.0.1
(i) v ~ 11, so find row 11 and go across to 1.796, then probability of 0.01, so the p-value must be 0.99. ~>S: :
up to the top of the column. This gives 0.95. Row 14, column 0.99 gives t ~ 2.624. 0 t
0.95
P(T < 1.796) ~ 0.95 ~ 95%

0 1.796 Critical t-values for a 95% confidence interval


(ii) Find row 11, go across to 3.106, which is in column 0.995. For a 95% confidence interval, you want t such that
So P(T< 3.106) ~ 0.995 P( I Tl < t) ~ 0.95, i.e. P(-t < T < t) ~ 0.95. 0.95
.. P(T> 3.106) ~ 1-0.995 ~ 0.005 This corresponds to an upper tail probability of 0.025
1.e. P(T> 3.106) ~ 0.5% so P(T < t) = 0.975. '

-t 0
0.'775 The 95% confidence limits for fl are
/! (.1,75 0.'71i ll.'iS
So if T ~ 1(9) for example, 8
1.000 3.078 6.314 12.71 x ± t -Jn where (-1, t) enclose 95% of the t(9) distribution.
}
0.816 1.886 2.920 4.303 P(T < t) ~ 0.975 gives 1 ~ 2.262
0.765 1.638 2.353 3.182 From tables, as illustrated on page 466, the critical value 1 is 2.262
and the critical values oft for the 95%
4 0.741 1.533 2.132 2.776
confidence interval are ±2.262 3' 213 ...
0.727 1.476 2.015 2.571 :. conf1'd ence 1'1m1ts
. are 39 7.87 ± 2.262 x --=~
1.440 1.943 2.447 '[\.[(_)'/ L: '[his v :il\,_' \Viii be: uc;c:d i'L ~-.X<ll,-q;lt· ) .) ()
{10
h 0.718
1.895 2.365 ~ 397.87 ± 2.298 .. .
0.711 1.415
g 0.706 1.397 1.860 2.306 95% confidence interval for JJ. ~ (397.87- 2.298 ... , 397.87 + 2.298 ... )
') 0.703 1.383 1.833 12.2621 ~ (395.6 g, 400.3 g) (1 d.p.)

To fi 11 d the critical values oft


for a 95°/~) Note that in Example 9.27, the value of 8 can be obtained directly by using the calculator in

· lencc 1_ 1_,,_)k under column 0. 9S,


I I
standard deviation mode. Look for the key xa,_, (see page 448).
for a 90(}·;) confi( You should practise finding 8 using the data of Example 9.27.
for a 9W}i) confidence look under column 0.99 .
j _·_ .. 99o/ conf-Idence
OJ ,1 _ ,·u
look under colunm 0,995.
· · l 1 Example 9.27
The following examples illustrate how to calculate confidence intervals when cnllca va ues
are found using a !-distribution. A student, studying the height of a particular plant, knows that it follows a normal
distribution with mean fl and variance a 2 , but he does not know the value of either of these
parameters. He selects 15 plants at random, measures their heights and calculates that the
Example 9.26 mean height of the sample is '12.2 em and the standard deviation is 1.4 em. Using these values,
. 1 f biscuits of a particular brand, follows a normal calculate a 90% confidenceintecrva!for fl. Calculate also the width of this interval.
The mass, m grams, of a paczet o 1 f b. .t e chosen at random and their masses
distribution with mean f.J.· Ten pac zets o lSCUl s ar
noted. The results, in grams, are Solution 9.27
397.3, 399.6, 401.0, 392.9, 396.8, 400.0, 397.6, 392.1, 400.8, 400.6
X is the height, in centimetres, of a plant, where X~ N(fl, u 2)
. d as f o 11ows:. "'
These can be summanse £..X -- 3978 • 8 ' 2:x 2~ 1 583 098.3. Sample values: x ~ 12.2 and s ~ 1.4 where sis the standard deviation.
Calculate a 95% confidence interval for fl.
n
8 2 ~--s 2 with n ~ 15
n-1
Solution 9.26
~f.\ X 1.4 ~ 1.5
X is the mass, in grams, of a pa;ket of biscuits.
X~ N(JJ., u 2 ) with both Jl. and u unknown. 8~ru~l.22 ...

Since "2 is unknown, find 82 (see page 447) Since n ~ 15, the t(14) distribution is considered.
2
a~ _ 1 _ (2:x2- (2:x) ) with n ~ 10 (),75 0.90 0,95
n-1 n 1.000 3.078 6.314
2
For a symmetric 90o/o confidence interval,
. 1( 3978.7 ) 0.816 1.886 2.920 column p ~ 0.95 is required in order to find the
~9 1 583 098.3- 10
critical value of t.
~ 10.325 .. .
12 0.695 1.356 1.782 When v ~ 14, t ~ 1. 761,
L3 0.694 1.350 1.771 so (-1.761, 1.761) encloses the central90% of
&~3.213 .. .
0.692 1.345 11.7611 the t( 14) distribution.
l:x 3978 ·7 - 397.87.
The sample mean, X = (Extract from tables on page 650)
n 10
Since n is small, a l(n- 1) distribution is required.
n ~ 10, so use a t(9) distribution.
468 T
I
The 90% confidence limits for I" are
8 1.22 ...
I CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION, p
X± t-~ 12.2 ± 1.761
-{,:,
X ·=
~15
Imagine that you want to find p,(the proportion of successes in a particular population. To
obtain an idea of its value, you could take a random sample of size n and calculate p" the
~ 12.2 ± 0.556 ... proportion of successes in your sample. This would give the best unbiased estimate p,where
.
90% confidence mterva ) +(0.556
l ~ (12 ·2 - 0 ·556 ... , 12.2 d )... ) p ~ p, (see page 447). You could also use this value of p, to obtain an interval estimate of p,
~ (11.64 em, 12.76 em 2 ·P· known as a confidence interval for p.

Width of interval~ 2 x 0.556 ··· The theory needed to derive the confidence interval for p is based on the sampling distribution
~ 1.11 em (2 d.p.) of proportions, P, described on page 445.
This states that, provided the sample size n is large, (n ?> 30),

the distribution of P, is normal, soP,- N(p, pnq) where q ~ 1- p.


II
r::i-'1
I\ , ~y,

ur··:
1. The heights, in metres,. of a rand.om sample of 6
5. A random sample of 7 independent observations
The standard deviation of the sampling distribution of proportions, JEf. is needed in the
of a normal variable gave calculation of the limits for the confidence interval. The difficulty is, however, that its value
policemen from a partlcular statlOn were as
follows: LX=35.9, L.x'=186.19. isn't known, since p isn't known!
1.80, 1.76, 1.79, 1.81, 1.83, 1.79. Calculate
policeme~ from
To overcome this, use p ~ p,. Jriting 1 - p, as q" the standard deviation of the sampling
Assum ing that the heights of (a) an unbiased estimate of the population
. 'b d th
that station are normally dtstn ute wt mean, l . dIstn utwn IS approximate ly -p,q,
. 'b . . . -.
meanp, (b) an unbiased estimate of the popu atton n
(a) calculate a 95% con~i~ence interval for It, standard deviation, You are then able to find approximate confidence intervals for p as follows:
{b) state the width of thts mterval. (c) a 90 % confidence interval for the
population mean.
Confidence limits Confidence interval
A sample of 8 independent observations of a . Width
2. normally distributed variable gave the followmg 6. The masses, in grams, of 13 washers selected
values:
from a production line at random are: 90% p, ± 1.645 ~ (r, -1.645 ~, p, + 1.645 ~) 2x1.645 ~
3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8. 15.4, 15.2, 14.6, 16.1, 14.8,

~ ~.r,+L96 ~)
15.3, 15.9, 16.0, 15.4, 14.6,
~
(a) Determine a 99% confidence interval for the 95%
15.0, 15.5, 16.1. p,± 1.96 (r,-1.96 2xl.96
population mean f..L· 'd h f
b) Find the difference between the WI t s ~ a Calculate 98% confidence_ limits. for the mean
( 90% confidence interval for panda 95 Yo
confidence interval for 11-·
mass of the washers on thts particular
production line, assuming that the mass can e
b 99% p,±2.576 ~ (r,-2.576 ~,p,+L96 ~) · 2x2.576 ~
modelled by a normal distribution.
3. Twenty measurements of x, the life, in hom~s, of
a particular make of candle gave the followmg 7. Fifteen pupils performed experiments t? fi~htl:; Remember that the sample size, n, should be large (n ?> 30 say), since the normal
data: value of g, the acceleration due to gravtty. e approximation to the binomial distribution is used in obtaining the distribution of sample
LX= 172, LX'= 1495.5. results were as follows: proportions. Also, since a continuous distribution has been used as an approximation for a
Assuming that the length of life is I?odelledoby a 9.806, 9.807, 9.810, 9.802, 9.805, discrete distribution, continuity corrections should be used. These are usually omitted,
normal distribution with mean p., fmd a 98 Yo 9.806, 9.804, 9.811, 9.801, 9.804, however, when calculating confidence intervals.
confidence interval for It· 9.805, 9.808, 9.803, 9.809, 9.807.
4. A random sample of 8 independent observations Assuming that these are taken f~om a ~?n:na~or Example 9.28
of a normal variable gave population, calculate 95% confidence tmltS
the value of g based on these results.
LX= 261.2, L(x -x)' = 3.22. A manufacturer wants to assess the proportion of defective items in a large batch produced by
Calculate a 95% confidence interval for the a particular machine. He tests a random sample of 300 items and finds that 45 items are
population mean. f defective.
If 400 such samples were taken:' how many o
these would you expect not to mclude the (a) Calculate an approximate 95% confidence interval for the proportion of defective items in
population mean? the batch.
(b) If 200 such tests are performed and a 95% confidence interval calculated for each, how
many would you expect to include the proportion of defective items in the batch?
T (
I
Solution 9.28 I (c) In part (b) the percentage of shops was estimated t
Youn .
ow reqmre n sue
hth
. h"
OWlt m±3.9%
at the percentage is to within ±2 %, .
45
(a) p,~-~ 0.15, q,~ 1- p, ~ 0.85, n ~ 300.
300 so that P, ± 1.645 Jp~q, ~p, ± 0.0 2
The 95% confidence limits for p are
Taking the + sign on both sides
p, ± 1.96 )p~q, ~0.15
0.15 X 0.85
± 1.96
300 P,+1.645 ~~p,+0.02
~ 0.15 ± 0.0404
95% confidence interval~ (0.15- 0.0404, 0.15 + 0.0404) .. 1.645 ~~ 0.02
~ (0.1096, 0.1904).
(b) The expected number of tests that include the proportion of defective items in the batch 0.34 X 0.66
~ 200 X 0.95 ~ 190.
1.e 1· 645 x
J
1.645 X V0.34 X 0.66
n
~ 0.02

0.02 ~ .y;;
Example 9.29 .y;; ~ 3 8.96 ...
In a random sample of 400 carpet shops, it was discovered that 136 of them sold carpets at n ~ 1520 (3 s.f.)
below the list prices recommended by the manufacturer. 1520 shops would have to he sampled.

(a) Estimate the percentage of all carpet shops selling below list price.
(b) Calculate an approximate 90% confidence interval for the proportion of shops that sell
below list price and explain briefly what this means. 9g Confidence i Is for o
(c) What size sample would have to be taken in order to estimate the percentage to within '
1. In a survey of a random sample of
8 9 8 10 11 8 7 12 12 9
±2o/o, with 90o/o confidence? 250 households in a large city, 170 households 9 8 11 8 9 7 11 12 11 10
owned at least one pet.
9 10 10 10 8 8 7 12 9 9
(a) ~ind an approximate 95% confidence 10 13 7 8 9 9 10 10 8 12
Solution 9.29 mter~al for the proportion of households in 9 9 10 10 11 12 9 9 10 9
the ct~y that own.at least one pet. (a) ~ind the proportion in the sample requiring
136 (b) Explam why the mterval is approximate.
(a) p, ~ ~ 0.34 SIZe 9,
400 (b) Assuming that the last 50 recruits can be
An estimate of p, the proportion of all carpet shops selling below list price, is f! where 2. In order to assess the probability of a successful
regarded as a random sample of all recruits
~utcome, an experiment was performed 200
f! ~ p, ~ 0.34. So an estimate of the percentage of shops is 34%. ~alculatc an approximate 90% confidence '
ttmes. The number of successful outcomes was
(b) An approximate 90% confidence interval for pis given by 72. mter~~l fo~ the proportion, p, of all recruits
reqmnng SIZe 9.b 0 ots.
(a) Find a 9 5% confidence interval for p the
)p,nq, ~ 0.34 ± 1.645 X
0 34 0 66 (c) Explain why the interval is approximate.
p, ± 1.645 " X " p~obability of a successful outcome. '
400 (b) Fmd a 99% confidence interval for p. 5. In a market research survey, 25 people out of a
~ 0.34 ± 0.039 random sample of 100 in a certain town said
3 · ~ survey was undertaken of the use of the that th~y regularly used a particular brand of
~ (0.301, 0.379)
u~ternet by residents in a large city and it was ;oap. Fmd app~oximate 97% confidence limits
~(30.1 %, 37.9%) dts~overcd that in a random sample of 150 or the proportr.on of people in the town who
resrdents, 45 logged on to the internet at least regularly use this brand of soap.
The probability that the interval (30.1 %, 37.9%) includes the true population percentage once a day.
is 0.90. If a large number of intervals are calculated in the same way, 90% of them would 6. A co !lege principal decides to consult the
(a) ~alculate an approximate 90% confidence
include the true percentage. mteryal for p, the proportion of residents in ~tudents abou! a proposed change in the times of
the crty that log on to the internet at least ectures. She fmds that, out of a random sample
once a day. of 80 students, 57 are in favour of the change.
(b) One hundred similar surveys arc carried out (a) ~ind an approximate 90% confidence
and the 90% confidence interval calculated mterval.for the proportion of students who
for. each survey. State the expected number are not m favour of the change.
of mtervals that include p. (b) State the effect on the width of such a
confidence interval when the confidence
4. Recruits are issued ~ith boots when they join the level is increased.
army. !he ~ast 50 pmrs of boots issued were the
! ollowmg s1zes:
7. In an opinion poll, 2000 people were interviewed (b) Estimate the additional number of families
and 527 said that they preferred white chocolate to be contacted if the probability that the Distribution of the sample proportion:
to milk chocolate. estimated proportion is in error by more
than 0.01 is to be at most 1%. (AEB) If a number of r~ndom samples, each of the same size n, is taken from a parent population
(a) Calculate an approximate 95% confidence
interval for the proportion of the population and the ~ro~ort~on of successes, P5 , calculated for each sample, then these proportions
who prefer white chocolate. State any 9. The probability of success in each of a long series form a dtstrlbutwn called the sampling distribution of proportions.
of n independent trials is constant and equal to
assumptions you have made.
(b) The a% confidence limits for the proportion p. Explain how an approximate 95% confidence Provided that n is large, the sampling distribution of proportions is approximately normal
preferring white chocolate, based on a sample interval for p may be obtained.
of size 500, are 0.2278 and 0.2922. Calculate
In an opinion poll carried out before a local
election, 501 people out of a random sample of
such that P,- N(p. pnq) where q= 1- p.
(i) the proportion of people in the sample
925 voters declare that they will vote for a
~is called the standard error of proportion.
of 500 who preferred white chocolate,
particular one of the two candidates contesting
(ii) the value of a.
the election. Find approximate 95% confidence
8. The results of a survey showed that 3600 out of limits for the proportion of all voters in favour of
10 000 families regularly purchased a specific this candidate. (AEB) Interval estimates:
weekly magazine.
Confidence interval for the population mean ,u
(a) Find approximate 95% confidence limits for
the proportion of families buying the Conditions
magazine. 95% confidence interval for fl
Normal population
a X+ 1.96 >f,;a)
(--1.96 >f,;'
-with known variance o 2
X
mrna -sample size n large or small
-sample mean X
Point estimates: unbiased estimates for
Non-normal population
population mean ,u fl = x, the sample mean
population proportion p f! = p" the sample proportion
-with known variance o 2
-sample size n large (n ~ 30) (x-1. 96 :[;;, X+ 1.96 :[;;)
n - sample mean X
population variance a 2 b2= --s
2
(s 2 is the sample variance)
n-1 Non-normal population

= n:1 (2:;' -x') -with unknown variance o 2


-sample size n large (n ~ 30)
-sample mean X
(x-1.96 ~,x+1.96 ~)
=-1-
n-1
(2:x'- (2:x)'.)
n
-sample variance s 2
where (/ = _n_ s 2
n-1
Normal population
ais given by 1ax,_ 1
1 on your calculator. -with unknown variance
-sample size n small (n < 30)
(x-t~,X+t~)
® Distribution of the sample mean -sample mean X where f? _n_ s2
=
If a number of random samples, each of the same size n, is taken from a parent population - sample variance s2 n-1
and the mean x, is calculated for each sample, then these means form a distribution called and (-t, t) encloses 95% of the t(n- 1) distribution
the sampling distribution of means:
Note that the width of the confidence interval can be reduced in one of these ways:
2 2
(a) when the samples are taken from a normal population X - N (,u, a ) with o- known, - by increasing the sample size (making n larger),
then for samples of any size n, the distribution X is also normal such, that X - N(i'' ~)­ by de~reasing the percentage confidence (eg choosing a confidence level of 90% instead
of 95 Yo),
a.
1;; 1s called the standard error of the mean. - by reducing the size of the population variance if possible (making a smaller).
2 Confidence interval for the population proportion p
(b) when the samples are taken from a non-normal population with known variance a then

for large valnes of n, the distribution X is approximately normal such that X- NV'' :')- Conditions
- sample size n large
95% confidence interval for p

This is known as the central limit theorem.


In both these cases, if the population variance, a 2 , is unknown, then 8 2 can be used instead,
-sample proportion Ps
(p,-1.96 JE!.,p,+1.96 JE!.)
provided that n is large. where q5 = 1 - Ps
Miscellaneous Worked Examples z-value required for 9 5% confidence interval ~ 1.96
. 2 196 0.1136 ...
Example 9.30
•• X • X ..r,; 0.0264 .. .

Each of a random sample of 50 one-pound coins was weighed and their masses, x grams, are ..r,; ~ 0.445 .. .
summarised by 0.0264 .. .
..r,;~ 16.85 .. .
Lx ~ 474.51, L:x 2 ~ 4503.8276. n ~ 283.9 .. .
(a) Use an unbiased estimate of variance to calculate an approximate 90°/o confidence interval The sample size required is 280 (2 s.f. ).
for the mean mass (in grams) of all one-pound coins, giving the end-values of the interval
(c) When the scales are underweighing by 0.05 g,
to two decimal places.
(b) Estimate the size of a random sample of one-pound coins that would be required to give a - :e co~id~~ce interval in part (a) would be amended. It would be shifted 0.05 units to
95% confidence interval whose width is half that of the interval calculated in (a). eng t. e new confidence interval would be (9.51, 9.57).
(c) It was found later that the scales were consistently underweighing by 0.05 grams. State - the confidence mterval m part (b) would remain the same, since this uses the estimate of
which of the results of (a) and (b) should be amended and which should not. Give the the vanance wh1ch would not be altered if all the readings were increased by 0.05 g.
amended values. (C)

Example 9.31
Solution 9.30
Out of a random sample of 1000 French people interviewed during Autumn 1996,410
X is the mass, in grams of a one-pound coin. supported a smgle European Currency.
(a) a2 ~ _1_ (L x2- (L x)z) (a) Calculate an approximate 99% confidence interval for the population proportion p of
n-1 n French people who supported a single European Currency. ' '
2 (b) Estunate the s1ze of a sample that would have provided a 99% confidence interval of
1 ( 474.51 )
w1dth 0.04 for p.
~ 49 4503.8276- 50
(c) Give one reason (other than rounding) why your answer to (b) is only an estimate. (C)
~ 0.01291 ...
a~ >/0129 ... ~ 0.1136 ... Solution 9.31
LX 474.51 410
(a) p,~
1000 ~0.41andq,~1-p,~0.59.
x~-~ 9.4902
n 50
By the central limit theorem, and using 8 since a is unknown,
In a sample of size 1000, the 99% confidence limits for p are
90% confidence limits for 11 are
a
X± 1.645 _,- ~ 9.4902 ± 1.645 X
0.1136 ...
-{5(5
P, ± 2.576 )p,nq'~0.41 ± 2.576x 0.41 x0.59
vn 50 1000
~
9.4902 ± 0.0264 ... ~ 0.41 ± 0.4006 ...

90% confidence interval~ (9.46 g, 9.52 g) (2 d.p.). 99% confidence interval~ (0.37, 0.45) (2 s.f.).
(b) For a width of 0.04, confidence interval would be 0.41 ± 0.02
a
(b) Width of 90% confidence interval~ 2 x 1.645 .,Jn r--------.-------""'
1.e. 2.576 ~/0.41 nx 0.59 0.02 0.41-0.02 0.41 0.41 + 0.02
~ 2 X 0.0264 ...
~ 0.05287 ... 1.154 ...
0.02 ~ ------ --- 7
Width of required interval~! x 0.05287 ... ..r,; width == 0.04

~ 0.0264 ... ..r,; ~ 1.154 ...


0.02
~ 57.73 .. .
n~ 3332.8 .. .
Sample size~ 3330 (3 s.f.).
476 '"'
T
I
')\ 477

Now ~-~(~:9) ~ 2.326, so a z-value of 2.326 gives an upper tail probability of 0.01.
5
(c) The answer is only an estimate because the estimate for p, p ~ p, was used to obtain an
approximate value for the standard deviation of the sampling distribution JE!. ··
8
/;/n must lie to the right of 2.326.
54-50
Also in the sampling distribution of proportions (from which the confidence interval is So /;/n > 2.326
obtained) a normal approximation is used for a binomial distribution. 8
4 > 2.326 X _!l_
;[;,
Example 9.32 ;[;, > 4.652 .. .
n>21.64 .. .
It may be assumed that the breaking strength of paving slabs laid in public areas is normally
Smallest sample size is 22.
distributed with mean 50 units and standard deviation 8 units. Random samples of n paving
slabs are taken. The mean breaking strength for a sample is denoted by X.
When n is large, by the central limit theorem
(a) State the distribution of X, giving its mean and variance.
- IS
. approximate
. 1y norma1 and X
- 8
(b) Find the probability that X exceeds 54 units in the case n ~ 25.
X ~ N ( SO,---;;')
(c) Find the smallest possible sample size if the probability that X exceeds 54 units is less
than 0.01. When n is small, you cannot say what the distribution of X is. You only know that its mean is
. . . 8'
Suppose that the breaking strength of paving slabs laid in public areas has mean 50 units and 50 an d 1ts vanance Is - .
n
standard deviation 8 units, but that the form of the distribution of breaking strengths is not
known. Random samples of n paving slabs are taken. What can be said about the form of the
distribution of the mean breaking strength of these samples in the case when n is large, and
Example 9.33
also in the case when n is small? (C)
The 'reading age' of children about to start secondary school is a measure of how good they
are at readmg and understandmg pnnted text. A child's reading age, measured in years is
Solution 9.32
denoted by the random variable X. The distribution of X is assumed to be N(fl, a 2 ). The
X is the breaking strength, X- N(50, 8 2 ). readmg ages of a random sample of 20 children were measured and the data obtained is
summarised by L.x ~ 232.6, 2:x 2 ~ 2756.22. '
(a) x- N(5o, ~). (a) Calculate unbiased estimates of fl and a 2 , giving your answers to correct to two decimal
places.
64 (b) Calculate a symmetric 9 5% confidence interval for fl. (C)
so X follows a normal distribution with mean 50 and variance .
n

Whenn~25,X- N(5o,~;).
Solution 9.33
(b)
, - Lx 232.6
- 8 (a ) fl~x~-~--~ 11.63
Standard deviation of X is -. n 20
5
a'~ _1_ (L:x'- (2:x)')
- >
P(X (
54)~P Z >
54- 50) n-l n
815
~ P(Z > 2.5) ~_!_
19
23 6
(2756.22- 2. ')
20
~ 0.0062
~ 2.688 ...
(c) P(X > 54) < 0.01 :;0.99 ~ 2.69 (2 d.p.).

p( z > 54- 50)


8/;/n < 0.01

So P ( Z <
54- 50)
8
/;/n < 0.99
0 z :=
10'001

2.326
478 ,6, COf\JCISE COUF\SE lrJ P., Lf~\/EL ST/\TISTiCS
SAMPLING .'\NO ESTIMATION 479

(b) Since the population is normal, with variance unknown, and the sample size is small, use 5. A random sample of 600 was chosen from the
the t-distribution. 9. I~ june 1996, 150 randomly chosen people aged
adults living in a town in order to investigate the
sJxteen or more were asked whether they smoked
ii number x of days of work lost through illness.
The 95% confidence limits for I' are x ± t ..r,; Before taking the sample it was decided that
cigarettes and 34 said that they did. Assuming
that the responses were truthful, calculate an
certain categories of people would be excluded
approximate 99% confidence interval for the
where (-t, t) encloses the central95% of the t(n- 1) distribution. from the analysis of the number of working days
population proportion of people aged sixteen or
lost although they would not be excluded from
Since n ~ 20, consider t(19). the sample. In the sample 180 were found to be
more who smoked cigarettes.
Give a reason why this interval might not
from these categories. For the remaining 420 contain the true population proportion. (C)
0.75 0.911 0.95 0.975 members of the sample:
2:x ~ 1260 1x 2 ~ 46 000. 10. A certain type of yarn is known to have a
1.000 From tables, the critical t-value is (a) Estimate the mean number of days lost breaking strength with a mean of 25 newtons. In
found from column 0.975, v ~ 19 through illness, for the restricted population, an attempt to increase its breaking strength the
and it is t ~ 2.093. and give a 95% confidence interval for the yarn is treated with a chemical. Each piece of
17 0.689 1.333 1.740 2.110 yarn in a random sample of 80 treated pieces has
mean.
18 0.688 1.330 1.734 2.101 (b) Estimate the percentage of people in the its breaking strength, x newtons, measured,
19 0.688 1.328 1.729 12.0931 town who fall into the excluded categories, producing the following summarised data:
and give a 99% confidence interval for this 2:x ~ 2122 1x 2 ~56 384
95% confidence limits are 11.63 ± 2.093 x -=
-v2.688 ...
v20
percentage.
{c) Give two examples, with reasons, of people (a) Obtain unbiased estimates of the mean, fl,
and variance a 2 , of the breaking strengths of
who might fall into the excluded categories.
~ 11.63 ± 0.767 ... pieces of yarn treated with the chemical.
(0) (b) Construct a symmetric 99% confidence
95% confidence interval for I"~ (11.63- 0.767 ... , 11.63 + 0.767 ... ) 6. The proportion of bruised apricots in a large interval for fl.
~ (10.9 years, 12.4 years) (1 d.p.). consignment is denoted by p. A sample of 100 (c) Hence state, with a reason, whether or not
apricots is examined and 11 apricots are found the manufacturer of the yarn is justified in
to be bruised. claiming that the treatment increases the
mean breaking strength of this type of yarn.
(a) Give an assumption under which it would (d) Explain why you were able to construct
be valid to calculate an approximate
your confidence interval without knowing
confidence interval for p.
the form of the distribution of the breaking
(b) Given that the assumption in part (a) is strength of a piece of yarn. (NEAB)
justified, calculate an approximate 90%
Miscellaneous exercise 9h confidence interval for p. Give the end- 11. Shoe shop staff routinely measure the length of
points correct to two decimal places. (C) their customers' feet. Measurements of the length
1. The mass of a certain brand of chocolate bar has 3. A catering company asked 50 randomly selected of one foot (without shoes) from each of 180
a normal distribution with mean p grams and college students to state the amount of money, 7. The lifetimes of light bulbs of a certain type have
adult male customers yielded a mean length of
standard deviation 0.85 grams. The masses, in $x, which they spent daily on lunch, and the standard deviation 25.3 hours. Each bulb in a
29.2 em and a standard deviation of 1.47 em.
grams, of 5 randomly chosen bars are results were summarised by Lx"" 56.50 and randomly chosen box of 12 was tested to failure
Lx 2 = 66.80. Calculate unbiased estimates of t~e and the mean lifetime was found to be 1.785. 7 (a) Calculate a 95% confidence interval for the
124.31, 125.14, 124.23, 125.41, 125.76. hours. mean length of male feet.
mean and the variance of the amount spent dady
Calculate a symmetric 90% confidence interval on lunch by students at the college, ~iving your (a) State two assumptions which are required so (b) Why was it not necessary to assume that the
for fl, giving the end-points correct to two answers correct to three significant ftgure~. that a symmetric 90% confidence interval lengths of feet are normally distributed in
decimal places. Hence find a symmetric 90% confidence mt~r~al for the population mean lifetime of the order to calculate the confidence interval in
Forty random samples of 5 bars are taken, and a for the mean amount spent daily on lunch, g!VIng part (a)?
bulbs can be calculated.
90% confidence interval for fl is calculated for the end-points correct to the n~ar~st ~0.0~. (b) Calculate a symmetric 90% confidence (c) What assumption was it necessary to make
each sample. Find the expected number of Justify the use of the normal dtstnbutwn m interval, given the validity of the in order to calculate the confidence interval
intervals that do not contain fl. (C) constructing the confidence interval. (C) in part (a)?
assumptions. The values of the end-points
should be given to the nearest integer. (C)
(d) Given that the lengths of male feet may be
2. A telephone company selected a random sample 4. A random sample of 250 adult me-?- un?ergoing a modelled by a normal distribution, and
of size 150 from those customers who had not routine medical inspection had thetr hetghts 8. A consumer group wishes to estimate the making any other necessary assumptions,
paid their bills one month after they had been (x em) measured to the neare.st centimetre, and proportion, p, of packages of sausages whose fat calculate an interval within which 90% of
sent out. The mean amount owed by the the following data were obtamed: content is greater than that stated on the label. A the lengths of male feet will lie.
customers in the sample was £97.50 and the random sample of 40 packages was tested and (e) In the light of your calculations in parts (a)
standard deviation was £29.00. 1x ~ 43 205, 2:x 2 ~ 7 469 107. and (d), discuss, briefly, the question 'is a
nine packages were found to contain more fat
Calculate a 90% confidence interval for the Calculate an unbiased estimate of the population than stated on the label. foot a foot long?' (One foot is 30.5 em.)
mean amount owed by all customers who had variance. Calculate also a symmetr~c 99% (C) (a) Estimate the number of packages that would (AEB)
not paid their bills one month after they had confidence interval for the populatton mean.
have to be tested in order that a 95%
been sent out. (AEB)
confidence interval for p should have a
width of 0.1.
{b) State, giving a reason, whether the number
of packages to be tested would be larger or
smaller than the answer in (a) if the
confidence level were changed to 90%. (C)
174, 164, 182, 169, 171, 187, 17. In an investigation to assess the difference in use The increase in length, in millllnetres, were as
12. Before its annual overhaul, the mean operating 176, 177, 168, 171, 180, 175. betvveen a credit card and a store card a random follows:
time of an automatic machine was 103 seconds.
sample of 20 people, each using both cards, was
After the annual overhaul, the following random Find a 95% confidence interval for the mean 13.52, 14.06, 13.19, 14.77,
selected. They supplied information from which 12.80,
sample of operating times (in seconds) was mass of pink bars of soap. 12.06, 15.12, 14.39, 15.81, 13.38.
in 1994, the difference between each person's '
obtained: Calculate also an interval within which
mean monthly spending on the credit and store Calculate a 95% confidence interval for the
approximately 90% of the masses of the white
90, 97, 101, 92, 101, 95, 95, 98, 96, 95. cards, £d, was calculated. The following mean increase in length of the population of
bars of soap will lie. (AEB)
Assuming that the time taken by the n:ac?ine to summary data were then calculated. fibres, assuming that the increase in length can
perform the operation is a normally d1stnbuted Ld ~ 1664 and Id 2 ~ 426 445. be modelled by a normal distribution.
15. An experimental physicist needs to estimate. the
random variable with a known standard true viscosity, ft Pascal seconds (Pa s), of a hght Stating all necessary. distributional assumptions,
deviation of 5 seconds, find 98% confidence machine oil. Using the same apparatus he takes 20. During a particular evening, 10 babies were born
calculate a symmetnc 90% confidence interval on a particular maternity ward in a large
limits for the mean operating time after the 12 independent measurements, x Pas, of the for the mean difference between the mean
overhaul. viscosity of the oil, obtaining the values below: hospital. The lengths, in centimetres of the
monthly spending for aU users of the two cards. babies were noted: '
Comment on the magnitude of these limits 25.8, 25.2, 24.7, 25.5, 25.3, 25.4, (NEAB)
relative to the mean operating time before the 50, 51, 45, 47, 49, 48, 54, 53, 45, 50.
25.2, 25.3, 25.8, 25.9, 25.2, 24.9.
overhaul. (AEB) 18. The mass, x millgrams, of each of 10 randomly
(LX~ 304.2 LX 2 ~ 7712.9) Assuming that the sample came from an
selected units of a new cancer drug was underlying normal population, calculate a
13. Packets of baking powder have a nominal weight When using this apparatus, measurements of the measured and the following results obtained: 95% confidence interval for the mean of the
of 200 g. The distribution of weights is normal oil's viscosity are distributed with mean p and
and the standard deviation is 7g. Average • 2 35.9, 35.2, 35.0, 34.9, 35.4, population.
vanance a . 34.8, 35.0, 35.1, 35.3, 35.1.
quantity system legislation states that, if the 2
Obtain unbiased estimates of ft and a . Hence
nominal weight is 200 g, 21. The external diameters (measured in units of
obtain a symmetric 95% confidence interval for Assuming that the masses are normally 0:01 m':l above a nominal value) of a sample of
(i) the average weight must be at least 200 g: ft. State any distributional as~umpti?ns you have distributed with mean ft, calculate an 80% p1ston nngs produced on the same machine were:
(ii) not more than 2.5% of packages may wetgh made in obtaining your conftdence mterval. confidence interval for Jt.
less than 191 g. The physicist explained the meaning of his 11, 9, 32, 18, 29, 1, 21, 19, 6.
(iii) not more than 1 in 1000 packages may confidence interval by saying there was a 19. Ten random samples of nylon fibre were tested Assuming a normal distribution, calculate a 95%
weigh less than 182 g. probability of 0.95 that play b_et;"'een the li~it~ for the amount of stretching under tension. confidence interval for the population mean.
of the interval. Explain why thts mterpretatwn ts Each fibre had the same length and diameter and (AEB)
A random sample of 30 packages had the
wrong and provide a correct explanation of 95% was stretched by applying a standard load.
following weights:
confidence as used in this context. . .
218,207,214,189,211,206,203,217, The manufacturer of the oil quotes a v1scos1ty of
183,186,219,213,207,214,203,204, 25.5 Pa s for the oil. With reference to your
195,197,213,212,188,221,217,184, confidence interval, state any conclusion you can
186,216, 198, 211,216,200. come to regarding the validity of this figure.
(NEAB)
(a) Calculate a 95% confidence interval for
the mean weight.
(b) Find the proportion of packets in the 16. Three weeks before an election in a certain
Mixed test 9A
sample weighing less than 191 g and constituency an opinion poll was conducted
using a random sample of 800 voters selected 1. A random sample of 40 nails is drawn from a 3. Out of 248 cars parked in a car park 72 were
use your result to calculate an population whose lengths are normally
approximate 95% confidence interval from the electoral roll. The numbers of persons fitted with an anti-theft device on th; steering
for the proportion of all packets who said they would vote for parties A, B, ~are distributed with mean p mm and standard wheel. Assuming that the cars form a random
recorded below; the remainder were categonsed deviation 0.48 mm. sample of parked cars, calculate an approximate
weighing less than 191 g. (AEB)
as 'Don't know'. (a) Calculate the width of a symmetric 95% confidence "interval for the population
99% confidence interval for Jl based on proportion of parked cars fitted with an
14. A company manufactures bars of soap. In a Don't know
random sample of 70 bars, 18 were found to be Party A Party B Party C this sample. anti-theft device on the steering wheel.
mis-shaped. Calculate an approximate 99% (b) Find the confidence level of a symmetric Give a reason, other than rounding in the
confidence interval for the proportion of 264 256 144 136 confidence interval having the same width as calculations, why the interval is approximate.
mis-shaped bars of soap. before, but based on a random sample of Give a reason why the assumption of
Explain what you understand by_a . (a) Calculate an approximate 90% symmetric 20 nails. (C) randomness might not be valid. {C)
99% confidence interval by constdenng confidence interval for the proportion of ~he
total electorate in the constituency that wJll 2. From time to time a firm manufacturing 4. The fat content of a well-known brand of
(a} intervals in general based on the above pre-packed furniture needs to check the mean beefburger was investigated by measuring the
vote for party A in the election.
method, (b) Give a very brief description of how the distance between pairs of holes drilled by percentage of fat, X, in each of 12 randomly
(b) the interval you have calculated. sample might have been selected, to ensure machine in pieces of chipboard to ensure that no selected beefburgers. The results were
The bars of soap are either pink or white in that it was random. change has occurred. It is known from summarised as follows:
colour and differently shaped according to (c) In the actual election, 41% of the total experience that the standard deviation of the LX~ 228, Lx 2 ~ 4448.
colour. The masses of both types of soap are electorate voted for party A. Give two . distance is 0.43 mm. The firm intends to take a
known to be normally distributed, the mean possible explanations for the fact that_thts random sample of size 11, and to calculate a Assuming the percentage fat content to be
mass of the white bars being 176.2 g. The value is not contained within the conftdenc~) 99% confidence interval for the mean of the normally distributed, find a 90% confidence
standard deviation for both bars is 6.46 g. A interval calculated in (a). (NEA population. The width of this interval must be no interval for the population mean p.
sample of 12 of the pink bars of soap had more than 0.60 mm. Calculate the minimum
masses, measured to the nearest gram, as value of 11. (L)
follows:
482 ,6... CONCiSE COURSE iN ,t.-LE\ El ST,t1TiSTiCS

Mixed test 98
J. A group of 65 students is asked to g?ess the 3. A researcher is designing a study to standardise a
length of a particular object and thetr. answers new intelligence test. It is kno~n t_hat scor~s on
arc recorded as x em, with the followmg results: this type of test are normally dtstnbuted wtth a
standard deviation of 15.0.
LX~ 6019.0 and L.x 2 ~ 557 733.8.
(a) Write down in terms of X, the sample mean,
(a) Show that the estimated standard error of and n, the sample size, an expression for a
the sample mean is 0.3 em. 99% Symmetric confidence interval for the
(b) Determine an approximate symmetric 95% mean test score.
confidence interval for the mean of the (b) Calculate, to the nearest 100, the value of n
population of all such g~esses, giving your such that the width of this confidence
limits correct to two dectmal places. interval will be less than 1.0. (NEAB)
(c) State one assumption which you have made
in your calculations. (NEAB) 4. In Tesbury's supermarket, economy packs of
Hypothesis tests: discrete distributions
butter are marked 250 g. An inspector takes a
2. A survey was carried out by a County Meals random sample of 12 packs and weighs them.
Service in order to gauge the response to a new
'healthy eating' menu. A random sample of ~00
Correct to the nearest 0.1 g, the weights, in In this chapter you will learn about
grams, were:
schoolchildren was selected from schools usmg
the menu and it was found that 84 children 246.5, 240.9, 245.3, 250.5, 248.7, 249.1, ® the language of hypothesis testing
approved of it. Calculate an appro::cimate 95% 251.0, 249.8, 249.8, 247.6, 246.2, 241.4.
confidence interval for the populatwn (a) Making any necessary assumptions, which w how to perform a test
proportion, p, who approve of t~e new menu. should be stated, calculate a 99% for the parameter p of a binomial distribution (small sample)
It is given that p = 0.38. Use a swtable .. confidence interval for the mean weight of
approximation to calculate th~ probabthty that,
- for the mean .<of a Poisson distribution
the packs of butter. .
in a random sample of 200 chtldren, the . (b) Calculate the width of the 99% confidence • Type I and Type II errors associated with hypothesis tests
proportion who approve of the new menu wtll be interval.
at least 0.42. (C) (c) How is the width affected when calculating
a 90% confidence interval?
Background knowledge
You will need to be able to
recognise the conditions needed for a situation to be modelled by a binomial distribution
or a Poisson distribution.
find related probabilities by direct evaluation or by using cumulative probability tables.

HYPOTHESIS TEST FOR A BINOMIAL PROPORTION, p


(small sample size)

Sid says that he has psychic powers and can read people's thoughts. To test this claim, a
volunteer from the audience sits on the stage while Sid sits in a separate room off stage. The
volunteer chooses a card from a well-shuffled pack and concentrates on the card for five
seconds. At the same time, Sid writes down the suit of the card, either hearts, diamonds,
spades or clubs. The card is replaced in the pack, the pack is shuffled and another card drawn.
The procedure is repeated until20 cards have been drawn.
There are four suits, so Sid has a one in four chance of writing down the correct suit if he
guesses the answer. If he isn't guessing, you would expect him to get more than one in four
correct. So if he gets five (or fewer) correct answers out of the 20, you would definitely say
that he is just guessing but if he gets as many as 19 or 20 correct you would have no
hesitation in saying that he could read people's thoughts.
But what about other values? If he gets 12 correct answers, would this be very unusual? What
Would you say if he got 10 correct? What about 8 correct?
484 I ''i"'

Somehow you have to decide on a cut-off point, c. This would be the least value you could You have to make a decision about the value of the probabili'ty th t · 'd d · l
find such that the probability of getting cor more correct answers would be very small. It l'k 1 h' . a 1s canst ere to Imp y an
un I e y or rare event. T 1s probability is called the significance level f th t A ·d
would be considered a rare event to get c or more correct answers. events that have a probability of 5% or less are regarded as unlt'k l o d e est.h sa gm e,
b b'l' f 1 o' 1 e Y an events avmg a
pro a I Ity o 'o or ess are regarded as very unlikely Often a · 'f' h
· · d
1eve1Is carne out. · s1gm Icance test at t e 5%
0 2 3 4 c-1 c 18 19 20
The cut-off point cis known as the critical value and the group of observations that are
constdered to be unusual or unhkely (rare) events is called the critical (or reJ· e .: ) ·
The t 1 1 d ·· 1 · d Cuon reg10n
en 1ca va ue an cnt1ca regwn epend on the significance level chosen. ·

To decide on the value of c, you could choose a number that seemed reasonable. If however Suppose you choose a significance level of 5% to test Sid's claim From the 1· b
P(X::;.8) 10')( Th' · · woronga ove
you perform a hypothesis (or significance) test you will be able to back up your argument and v ~ o. IS Is greater th~n 5%, sox# 8 is not the critical region; getting eight '
correct answers would not be considered an unlikely or rare event.
conclusion with statistical theory.
Suppose that X is the number of correct answers that Sid writes down for the suits of the 20 But P(X > 9) ~ 4%, which is less than 5%, so getting nine correct answers would be
cards. If you assume that Sid is just guessing, the probability that he writes down the correct considered an unhkely or rare event. Therefore the critical value for a 5°'0 level of · 'f'
· 9 d h ·· 1 · . ' stgmicance
IS an t e cntica regwn Is x;;, 9 , i.e. 9 , 10 , 11 , 12, ... , 19
. or 20 correc t answers.
suit is 0.25. The experiment is performed 20 times, so there are 20 independent trials, each
with a probability of 0.25 of success. This suggests a binomial situation (see page 279). In
fact, on the assumption that Sid is guessing, X can be modelled by a binomial distribution 0 2 3 4 5 6 7 8 . 9 10 11 12 13 14 15 16 17 18 19 20
with n ~ 20 and p ~ 0.25, i.e. X~ B(20, 0.25).
You now need to look for a value, c, in this distribution such that P(X;;, c) is very small.
Binomial probabilities can be calculated directly (see page 279) or found from cumulative
probability tables which give P(X < r) for various values of nand p, where X~ B(n, p). The
extract here relates to X~ B(20, 0.25) and has been reproduced from page 646.
The tables give probabilities to four decimal places indicating
that, to four decimal places, P(X < 13) ~ 1.0000. This implies P(X <; r) for X~ B(20, 0.25) The language used in hypothesis testing
that P(X;;, 14) ~ 1 ~ P(X < 13) tends to 0. If he is just guessing, it
p ~ 0.25 The assumption that Sid is guessing is called the null hypothesis and it is written Ho. The null
would be almost impossible for Sid to give 14, 15, 16, 17, 18,
19 or 20 correct answers. So if he gives, for example, 14 correct n~20 r= 0 0.0032 hypothests IS very nnportant as it provides the model for the calculations. You would write
answers you would certainly have to conclude that he is able to 1 0.0243 H 0 : P ~ 0.25
read people's thoughts in some way! 2 0.0913 T T
Similarly P(X;;, 13) ~ 1- P(X < 12) ~ 1- 0.9998 ~ 0.0002. 3 0.2252
Getting 13 or more correct answers would be a very rare event. 4 0.4148
P(X;;, 12) ~ 1- P(X.; 11) ~ 1 ~ 0.9991 ~ 0.0009. 5 0.6172 If Sid has psychic powers, then he should get more than one in four co~rect and the
Getting 12 or n1ore correct answers is still a very rare event. 6 0.7858 probabiii:y that he gives the correct suit will be more than 0.25. This is called the alternative
7 0.8982 hypothesis and 1s denoted by H 1 . You would write
P(X;;, 11) ~ 1- P(X.; 10) ~ 1- 0.9961 ~ 0.0039.
8 0.9591
On about four occasions in every thousand Sid might give 11 or H 1: P> 0.25
9 0.9861
more correct answers. This is still a rare event.
0.9961
T T
10 ·'it-:'1

P(X;;, 10) ~ 1- P(X < 9) ~ 1- 0.9861 ~ 0.0138 ~ 1%. 11 0.9991


It would be very unlikely for Sid to give 10 or more correct 0.9998 Since you are interested in whether the probability is greater than 0 25 the crt'ti. 1 · ·
12 th' . . · , ca regwn m
answers but it could happen on about one occasion in every
13 1.0000 Is examp1e IS at the nght-hand end of the distribution. This is known as the upper tail.
hundred.
14 The variable~' th~ nu~ber of correct answers, is the test statistic. The number of correct
P(X;;, 9) ~ 1- P(X < 8) ~ 1- 0.99591 ~ 0.0409 ~ 4%. 15 ~nswers that S1d gives m the experiment is the test value. To perform the hypothesis test you
It would still be unlikely for Sid to give 9 or more correct answers. eed to work out whether the test value hes m the critical region or not.
P(X;;, 8) ~ 1- l'(X < 7) ~ 1- 0.8952 ~ 0.1018 ~ 10%. If the test value lies in the critical (or rejection) region reJ'ect H in favour of H Thi
that 'll · h . . ' o I· s means
This probability is not that small. If Sid is just guessing, on 10% of occasions he could give you W! reJect t at Std Is guessing in favour of the alternative hypothesis that he has
8 or more correct answers. psychic powers.
If the test value does not lie in the critical region, do not reject H 0 • There is not enough If you do not have cumulative tables, work out the binomial pr b bTt"
evidence to say that he has psychic powers. Writing this another way, if the test value lies in follows: o a 1 1 1es as
the acceptance region, accept H 0 and conclude that Sid is guessing.
P(X ~ 7) = 1 - P(X < 6)
Suppose that in the experiment Sid gives seven correct answers and he says that this proves
= 1- (0.75 20 + 20 X 0.75 19 X 0.25 + 20 e2 X 0. 75 18 X 0.252
that he has psychic powers. Is this enough evidence statistically? From the critical region
diagram, you can see that, at the So/o level of significance, x = 7 does not lie in the critical + ZOe3 X 0.75 17 X 0.25 3 + ZOe4 X 0.75 16 X 0.25 4
region. Therefore H 0 is not rejected. This means that there is not enough evidence to say that +zoe, X 0.75 15 X 0.25 5 + zoe6 X 0. 75 14 X 0.25 6 )
p > 0.25, i.e. to say Sid has psychic powers. You would conclude that he is just guessing. = 0.2142
~ 21%

PROCEDURE FOR CARRYING OUT A HYPOTHESIS TEST Since P(X ~ 7) > 5%, the test value x = 7 is not in the critical region. There is not
enough evidence to reject H 0 •
To find whether the test value is in the critical region you can work out the critical region as You would conclude that Sid is just guessing; he does not have psychic powers.
described above. This is a useful method as it gives a lot of information, but its disadvantage NOTE: when you are testing the value x = 7, it may seem strange that you have to work out
is that it can be rather time-consuming. P(X ~ 7) r~ther than just P(X = 7). Remember that this is necessary as you are essentially
In this example, it may be quicker instead to calculate the probability that X is greater than lookmg fm the cnt1cal regton to see whether the test value lies in this region or not.
the test value. If this probability is less than 5%, this means that the test value is in the upper The pr~babilities and critical region can be illustrated diagrammatically. Below is the
tail So/o of the distribution, i.e. it is in the critical region. probabll1ty d1stnbutwn for X- B(20, 0.25). Note that it is positively skewed and the
This method is illustrated in the working below which tests the sample value x = 7 and probabilities for 12 to 30 correct answers are so small that they cannot be shown on the
d1agram. The test value has been circled.
assumes that you have not found the critical region first. Note that the stages of the test are
shown in the 1nargin and additional commentary is given in italics. ~ Boundary for
i critical region (5%)
Let X be the number of correctly identified suits out of the 20 trials. Assuming I
I
variahl,:. that the pack is well shuffled between each trial and the trials are independent,
X can be modelled by a binomial distribution, where X- B(20, p).
L 'ltau: i 11, :1ml H 0 : p = 0.25 (Sid is guessing)
il, H 1: p > 0.25 (Sid has psychic powers)
If H 0 is true, then X- B(20, 0.25)
T I '
1+-+-~r-r.~~~~-r~
0 2 3 4 5 6 CZJ 8 I 9 10 11 12 13 14 15 16 17 18 19 20
4, St:-llt' lcvd :111d Use a one-tailed (upper tail) test, at the 5% level. + ,-------~----"""--·---~----·--~------:;:;.~
fc~s' ·-.::dew Cdicai region
type nf tc:~t,
Since P(X ~ 8) ~ 10% and P(X ~ 9) ~ 4%, the 5% boundary comes between 8 and 9. Note
The test value, x, will lie in the critical region, (the upper tail 5% of the
that w1th discrete distributions you will probably not get a perfect 5% in your calculations.
distribution), if P(X ~ x) < 5%.
Reject H 0 if P(X ~ x) < 5%, where xis
the test value. P(X <; r) for X- B(20, 0.25) Example 10.1
p = 0.25
Sid gives 7 correct answers, so test A drugs company produced a new pain-relieving drug for migraine sufferers and its
l'l'ljttin:d x = 7 and find P(X ~ 7). n = 20 r = 0 0.0032 ~dvert1sements stated that the drug had a 90% success rate. A doctor doubted whether the
j1COf:nl,iJj\y. From cumulative binomial tables 1 0.0243 rug would be as successful as the company claimed. She prescribed the drug for 15 of her
2 0.0913 pat1ents. After s1x months, 11 of these patients said that their migraine symptoms had been
P(X ~ 7) = 1 - P(X < 6) reheved by the drug.
3 0.2252
= 1-0.7858
4 0.4148 (a) Test the drug company's claim, at the 5% level of significance.
= 0.2141 5 0.6172 (b) Should the doctor continue to prescribe the drug?
~ 21% 6 0.7858 <--- P(X.;; 6)
7 0.8982
8 0.9591
¥
I
'[!
488

Solution 10.1 It is time-consuming to draw a probability diagram h ·


. b h . . w encarrymgoutth h h .
(a) Let X be the number of patients in 15 whose symptoms are relieved by the lt can e .elpful m Illustrating the results. The distribution X- B( e. ypot esis test, but
skewed With the probabilities in the lower tail being'1:o- II 1h5, 0.9) ls very negatively
' ',,,, drug. Assuming that the effect of the drug on a patient is independent of the o sma to s ow m the diagram!
effect on other patients, X can be modelled by a binomial distribution, where Boundary for
X -B(15,p). critical region (5%)

H 0 : p ~ 0.9 (The success rate of the drug is 90%.)


H 1: p < 0.9 (The success rate is less than 90% and drug is not as successful as

ill
the company claims.)
i'l'
If H 0 is true, then X- B(15, 0.9).
T 0 2 3 4 5 6 7 8
' I
ii' i' 9 10 @ 12 13 14 15

Since the alternative hypothesis is p < 0.9, the critical region is in the lower tail
of the distribution, so use a one-tailed (lower tail) test, at the 5% level.
The test value, x, will lie in the critical region, (the lower tail 5% of the
"''!I()' I distribution), if P(X <:; x) < 5%. !JNE-TAILED AND TWO-TAILED TESTS
Reject H 0 if P(X <:; x) < 5%, where xis the test value.
The test value is x ~ 11, so find P(X <:; 11). One-tailed test
Using cumulative binomial tables, if the tables give only values of p up to 0.5,
you need to use symmetry properties as illustrated on page 284. In the examples so far, one-tailed tests have been considered with either th I
tall ~emg used for the critical region, depending on the alter~ative hypothe~s~pper or ower
P(X .; 11\ p ~ 0.9) ~ P(X ;;, 4 I p ~ 0.1) P(X <; r) for X- B(15, 0.1)
~ 1- P(X.; 3) p ~ 0.1 In general, for a significance level of a%, null hypothesis Ho: p ~Po and a test value x,
~ 1-0.9444
n~ 15 r~ 0 0.2059 - :f HII infvodlvesha >h sign, indicating that you are looking for an increase in P use the upper
~ 0.0556 attomweterP(X~x)<a%, '
1 0.5490
~ 5.6%
0.8159
- if Hf1 dinvohlvehs a < sign, indicating that you are looking for a decrease in P use the lower tail
If you are calculating the probabilities <--P(XO) tom wet erP(X.;;;;x)<a%. '
directly: 4
Remember that in both cases you use P(X ... ) < a%.
P(X .; 11) ~ 1 - P(X ;;, 12) 5 0.9978
~ 1- ( 15 C 12 X 0.1 3 X 0.9 12 + 15 C 13 X 0.1 2 X 0.9 13
+ 15 X 0.1 X 0.9 14 + 0.9 15 ) Two-tailed test
~ 1-0.944 ...
~ 0.0556 (4 d.p.) A t~~--tailed te.st is carried out when the alternative hypothesis looks for a chan e in not
~ 5.6% spectf1cally an mcrease or a decrease. If the significance level is a 'X0 then th , 'tg ] p, · ·
in two t h If · h 1 ] ' e en 1ca regwn 1s
P(X <:; 11) is greater than 5%. This means that boundary for the critical region par s, a m t e ower tai and half in the upper tail,
(the lower tailS% of the distribution) will be slightly to the left of x ~ 11. So ~ if HI involves a * sign, indicating that you are looking for a change in p,
x ~ 11 is not in the critical region.
H 0 is not rejected and the drugs company's claim of a 90% success rate is m the lower tail, the critical region consists of values less than or equal to c h th
P(X <; c 1) < lao'
2 10.
1 sue at
upheld.
(b) P(X <:; 11) is only just greater than 5%. With safety in mind, it would be wise to suggest in the upper tail, the critical region consists of values greater than or equ 1t h h
P(X :, c ) < 1 o; a o c 2 sue t at
""' 2 2a1o.
that the doctor errs on the side of caution and carries out further tests before accepting that
the success rate is 90%. For
. . exampl
I · ·t·lcance 1eve1 (two-tailed test) the probability distribution and
. e, for a 5')(0 Slgm
cntica regwn might look like this:
¥I
i !CS

coJ:c.lt:~ion.
SinceP(X"1)<5"'
. . ""' /o' t h e samp Ie value x = 1 lie . ..
IS reJected in favour of H. At the 10'Yc . 'fi s m the crltlcal region, so Ho
. that support for the Purpie party has ;h~!;~d~ance level, there is evidence
(b) To fmd the critical region, consider se aratel h .
distribution. p Y t e upper aricqower tails of the

Critical region in lower tail:


Fmd the maximum value of c such that P(X (c) < 0.05
\
You already know that P(X" 1) 0 OS . ·
so try c = 2. ""' < · 'I.e. that 0 and 1 hem the critical region,
t--·---~-------------------------fr•'"
-4----- --.,---1
Critical rsgior' C:iti1:ai IQgi<l'l P(X < 2) ~ 0.0422 ... + P(X ~ 2)
(x c:) (i ('.,)
~ 0.0422 ... + 12
C2 X 0.65 10 X 0. 35 2
~0.1512 .. .
Example 10.2
~ 15%
In last year's local elections, the Purple party gained 35% of the vote. Prior to this year's
election, the party asked a researcher to find out whether support of the party had changed. P(X ( 2) is greater than 5%, indica tin that - . . ..
Out of twelve voters selected at random, one said that he would vote for the Purple party. So the lower tail part of the crif I g . x - 2 Is not m the crltlcal region.
ICa regiOn 1s x = 0, 1.
(a) Test, at the 10% level, whether support for the Purple party has changed. Critical region in upper tail:
(b) Find the critical region for the test. Find the minimum value of c that P(X;;. c) < 0 · OS · BY guesswork, try c = 8.
P(X;;, 8) ~
12
0.654 X 0.358 + 12C9 X 0.653
Cg X X 0.359 + 12Cl0 X 0.652 X 0.3510
Solution 10.2
+ 12 X 0.65 X 0.35 11 + 0.3512
J. (a) Let X be the number of voters in 12 who say that they will vote for the
l)dii!c'i:li\ ~ 0.0255 ... < 5%
\·cHi~:bk. Purple party. Assuming that each person votes independently, X can be
modelled by a binomial distribution, where X- B(12, p). This indicates that x > 8 is in the critical region.

' S:_~l tl' II:, H 0 : p ~ 0.35 (The support has not changed) But is 8 the smallest value in the critical region? To check this try c ~ 7·
f! i. H 1: p * 0.35 (The support has changed) P( 12
X;;. 7 ) ~ c, X 0.65 5 X 0.35 7 + 0.0255 ...
' •

If H 0 is true, then X- B(12, 0.35). ~0.084 ... > 5%

5This . x ~ 7 is not in the cr·t·


thindicates that 1 1ca1 regiOn.
.
o e upper tall part of the critical region consists of x > 8.
Since the alternative hypothesis is p # 0.35, consider both tails of the
Therefore the critical region is x ~ 0 1 8 9 10 11 12
distribution and perform a two-tailed test at the 10% level ' ' ' ' ' ' 0

In this case the 10% for the significance level is distributed evenly between 5% i ! 5%
I I
the upper and lower tails, with 5% at each tail. I I
I
5. St--1:c t!tt: The test value, x will lie in the critical region, (the lower tail 5% or the
upper tailS%), if P(X ( x) < 5%, or P(X;;. x) < 5%. '
I
I
I

l'
I
('L'Hei"iCJii, l
Reject H 0 if P(X ( x) < 5%, or P(X;;. x) < 5%.
'''
I

6. C1ic:Liau:· d1v The test value is x ~ 1, so you need to look at the lower tail part of the I
I

I'
critical region and find P(X ( 1).
n~'illi:·\_'d
'
P(X < 1) ~ P(X ~ 0) + P(X ~ 1) '
l I
I

~ 0.65 12 + 12 X 0.65 11 X 0.35 I I

0.04244 ...~ . I
I
I
I I I
4.2% ~ 0 2 3 4 5 6 7 8 9101112x
(You can use cumulative binomial tables if they are available.)
492 ( 1 i( i; '
'
.. T
i

TYPE I AND TYPE II ERRORS


When you perform a significance test there f . .
shown in the table below. ' are our possible concluswns, and these are
Define X, the binomial variable being considered and tbe general form of its distribution, Two of the conclusions lead to correct decisions and tbe o h
for example X- B(12, p). The errors associated witb wrong decisions are called T t Ier twdoTlead to wrong ones.
ype an ype II errors.
State tbe null hypothesis H 0 and the alternative hypotheses H 1 concerning p for example The outcomes and errors are summarised as follows:
Ho:P=Po
H1:P<Po
(a) Ho is b your test leads you to accept H o - correct decision
. true and
(b) H o IS true ut your test leads y t · H ..
State the distribution of X assuming that tbe null hypothesis is true, i.e. assuming that X (c) Ho isfalse but your test leads ou o reJect o- wrong deciSIOn- Type I error
does follow a binomial distribution with the value of p specified in H 0 , for example (d) Ho is false and your test leads yy~: ~~ :~JcepttHHo - wrong ddecision - Type II error
ec 0 - correct ecision
X- B(l2, p0 )
It can be helpful to see these on a diagram:
State the type of test (one-tailed or two-tailed). This depends on whether the alternative
hypothesis looks for an increase or a decrease (one-tailed) or a change (two-tailed) in p,
Test decision
for example, Accept Ho Reject H 0
Ho:P=Po Ho:P =Po Ho:P =Po Actual H 0 is true ./correct Type I error
H1:P<Po H1:P>Po H1:P*Po situation H 0 is false Type II error t1'correct
indicates one-tailed test indicates one-tailed test indicates two~tailed test
(lower tail considered for (upper tail considered for (both tail ends considered
critical region) critical region) for critical region)
you reject Ho if the test value lies in the critical re io Th. . . . .
State the significance level of the test, a%. Remember that this defines the critical region. level of significance of the test so the b b T gh n. h IS regwn !S ftxed accordmg to the
State the criterion for rejection of H 0 , for example, for test value, x, the same as the significance le~el So f~;~ ta ~ lty t. adt t e testhvalue lies in the critical region is
. es carne out at t e a% level of significance
P(Type I error)= a% '
Reject H 0 if P(X,.;; x) <a%, Reject H 0 if P(X ~ x) <a%, Reject H 0 if P(X ,.;; x) < !a%,
i.e. if x lies in the lower tail i.e. if x lies in the upper tail or if P(X ~ x) <!a%,
a% of the distribution a% of the distribution i.e. if x lies in the lower or
upper tail !a% of the
distribution To calculate the probability of k" y
The error is then calculated as ::l~o:; a ype II error, a specific value for Hl is stated.
Obtain the test value, x.
P(Type II error) = P(accept Ho when H, is true).
Calculate the required probability to see whether x lies in the critical (rejection) region.
This is illustrated in the following example.
Make your conclusion by rejecting H 0 or not. Then relate this to the context of the
situation being tested. Example 10.3

A random observation is taken from a binomial distrib .


NOTE: The method is essentially the same for a large sample, but, in the case of large null hypothesis P = 0 8 against tbe alt t' h h utwn X- B(20, p) and used to test the
· erna 1ve ypot es1s p > 0.8.
samples, use is made of the application of the normal approximation to the binomial
The critical region is chosen to be x;;, 19.
distribution. This test is described on page 528.
(a) What is the significance level of the test?
(b)
What IS the probability of making a Type I error?
(c)
Fmd the probability of making a Type II error if, in fact, P = 0.85.
I 1

3. T.he ra~do~ v~riable X can be modelled by a


Solution 10.3 bmom1al d1stnbution with n = 10. 7. A d:iving instructor cl~i?ns that 95% of his
pupil~ pass t.heir. drivin~ test at the first attempt.
You are given that X- B(20, p). A_ ra?-do~ observation, x, is taken from the !om Is constdenng havmg lessons with this
dtstnbutJon.
mstruc~or but wonders whether 9 5 % is an
Test, at the ~% level, the hypothesis that
(a) H 0 : p = 0.8 overestimate. He decides to conduct a
P = 0.45 agamst the alternative hypothesis significance test at the 5% level and discovers
H 1:p>0.8 *
P 0.45, (a) when x ~ 7, (b) when x ~ 1. that last month, out of the 15 pupils who took
The critical region is x;;. 19, so to find the significance level of the test, find P(X;;. 19). the test for the first time, 11 passed.
4. Records kept in a hospital show that 3 out 0 f
every 10 casualties who come to the casuait (a) :-<'hat would Tom decide about the driving
P(X ;;. 19) = P(X = 19) + P(X = 20) departmen~ ~ave to wait more than half an hour mstructor's claim?
= 2 °C19 x 0.2 x 0.8 19 + 0.8 20 bcf?re rece~vmg medical attention. The hospital (b) Fi?-d the critical region for the number of
decided to mcrease the staffing in the depart~ent failures last month.
= 0.0691 ... by one person ~nd it was then found that, of the
8. In a test of ten true-false questions, Sian got 8
= 7% next 20 casualties, 2 had to wait for more than
half an hour for medical attention. correct. Test at the 5% level whether she c ld
The significance level is approximately 7%. Test whether the new staffing has decreased the have obtained this score by guessing all theou
number of casualties who have to wait more answers.
(b) P(Type I error)= P(reject H 0 when H 0 is true) than half an hour for medical attention
P~rform the test (a) at the 5% level, (b).at the 9. At a particular hospital it was found from past
H 0 is rejected if x > 19, so 2 Yo level. (L) records that the probability that a patient does
P(Type I error)= P(X;;. 19 when p = 0.8) = 7% (found above). not turn up for an appointment is 0.3.
5. T.he ra~dor:n variable X can be modelled by a Following a campaign to make patients more
Note that this could have been stated directly from part (a), since the probability of a bmomtal distribution with parameters n = 9 and a war~ of the problems caused by missed
p, whose value is unknown. appomt~ents, a significance test at the 10% level
Type I error is the same as the significance level of the test.
was earned ou~ to .decide whether the campaign
(a) ~i?-?' at th~ 10% level of significance, the ha~ been effective m reducing the number of
(c) You make a Type II error when you accept H 0 (which you will do if x < 19) when pis the crltlcal reg1~:m to test the null hypothesis that pat1e~ts who did not turn up for an
value specified in H 1 (not the value given by H 0 ). P = 0.3 agamst the alternative hypothesis appointment. A random sample of 16 patients
that p > 0.3. was surveyed.
The hypotheses are now
(b) Explain what is meant by a Type I error.
(c) State ~he probability of making a Type I (a) F!nd the critical region for the test.
H 0 :p=0.8 (b) Fmd t.he probability of making a Type II
error m the test described in (a).
H 1 : p = 0.85 error m. ~he test described in (a) if in fact the
6. ~n each of the following, a random observation x probabrhty that a patient does not turn up
So P(Type II error) = P(accept H 0 when H 1 is true) for an appointment is now 0.25.
IS taken fr.om a binomial distribution X~ B(n, p).
= P(X < 19 when p = 0.85) Test the gJVen hypotheses at the significance level
= P(X < 19 when X- B(20, 0.85)) stated. 10. Je~si~a is. trying to find out whether a particular
com rs ?lased, so she performs a significance test.
P(X.; 18) = 1- P(X = 19)- P(X = 20) Level of S~e de~1des that she will say that the coin is
bras~d m favour of he~ds if, when she tosses it
= 1- 2°C 19 X 0.15 X 0.85 19 -0.85 20 X n Hypotheses significance !5times, at least two-thirds of the tosses result
=0.8244 ... (a) 6 m heads.
8 Ho:p~0.45, 5%
= 82% H 1: P>0.45 (a) What significance level did she use for her
test?
(b) 1 10 Ho:p~0.45, 5%
The probability of making a Type II error is 82%. (b) What is the probability that she makes a
H 1: p < 0.45
This is a very high probability. To make this smaller, you could !~crease the significance (c) 9 15
Type I error?
H 0 : p ~ 0.35, 5% (c) If, in fact, the coin is biased with
level of the test. But this would of course increase the probability of making a Type I error! probability of 0. 7 of obtaining a head what
H,: P> 0.35
(d) 9 rs the probability that she makes a Ty~e II
15 H0 : p~0.35, 5% error?
H,: P*0.35
Exercise lOa-- Testing pin a binomial distribution (small samples) (e) 2 9 H 0 : p ~ 0.45, 5% 1 1. A_ ra?do~ observation is taken from a binomial
H,: P<0.45 distnbutJ?n X- B(25, P) and used to test the null
1. A certain type of seed has a germination rate of 2. Hester suspected that a die was biased in favour hypothesrs P = 0.4 against the alternative
(f) 16 20 H 0 : p ~ 0.45,
70%. The seeds undergo a new treatment after of a four occurring. She decided to carry out a 1% hypothesis p < 0.4.
which 9 germinate in a packet of 1 0 seeds. hypothesis t~st. H 1 : P>0.45 The critical region is chosen to be x .-,; ; 6 .
(g) 5 7
Stating suitable null and alternative hypotheses, (a) State suitable null and alternative H 0 : p=0.4, 10% (a} At what significance level is the test carried
test, at the 5% level, whether this is evidence of hypotheses for the test. H 1 :p>0.4 out?
an increase in the germination rate. When she threw the die 15 times, she obtained a (h) 2 20 (b) What is the probability of making a Type I
H 0 : P=0.3, 1%
four on 6 occasions. error?
H 1 :p<0.3
(c) Find t.he_rrobability of making a Type II
(b) Carry out the test, at the 5% level, stating
error 1f, m fact, p = 0.3.
your conclusion clearly.
496 /

SIGNIFICANCE TEST FOR A POISSON MEAN A, Solution 10.4


Let X be the number of misprints on the classified d .
ure rnean /t ')
(..j ~ssu.mm? that misprints occur randomly, X can bearn:e~t~~e:~nts pa~e.
distnbutwn, where X_ Po(.:l). e e Y a Pmsson
H0: A~6.5
This follows the same pattern as that for the binomial parameter p as follows: H,: ;[ > 6.5 (the number of misprints has increased)
" Define X, the Poisson variable being considered and the general form of its distribution, for If H 0 is true, then X- Po(6.5).
example X- Po(.:l). [;·;:•

• State the null hypothesis H0 and the alternative hypothesis H 1 concerning A, for example
Ho: ;[ ~ Ao alternative hypothesis is A> 6 · 5 , use a one-tale
Since the .d "] d test at the 5'!\ 1 1
H 1:;t>.:l 0 an d cons! er the upper tail of the d.18 t n.b ut1on
. for t h e cnt1cal
.. reg1.on. o eve

o State the distribution of X assuming that the null hypothesis is true, i.e. assuming that X At the 5% level, the sample value x will lie in the
does follow a Poisson distribution with the value of ;t specified in H 0 , for example crzt1cal region if P(X;;, x) < 5%. P(X < r) where X- Po(6.5)
X ~Po(;l 0 ) Reject Ho if P(X;;, x) < 5%, where xis the test value A~ 6.5

• State the type of test (one-tailed or two-tailed), for example, G, The test value is x ~ 12, so find P(X;;, 12 ). 0.0015
Usmg cumulative Poisson tables (page 64 8 ) 1 0.0113
H 0: A~A 0 Ho:J.~Ao Ho: J.~Ao

H 1: A> Ao H 1:A*Ao P(X ;;, 12) ~ 1 - P(X < 11) 2 0.0430


H 1: A<Ao 3 0.1118
~ 1-0.9661
one-tailed test one-tailed test two-tailed test 4 0.2237
~ 0.0339
(lower tail considered for (upper tail considered for (both tail ends considered for 5 0.3690
critical region) ~3.39%
critical region) critical region) 6 0.5265
Since P(X;;, 12) < 5%, the sample value of 12 7 0.6728
• State the significance level of the test, for example a%. This defines the critical region. Ls m1spnnts lies in the critical region, so reject H in 8 0.7916
0
e State the criterion for rejection of H 0, for example favour of H 1. 9 0 .8774
There is evidence, at the 5% level, that the average 10 0.9332
Reject H 0 if P(X.;;;; x) <a%, Reject H 0 if P(X ~ x) <a%, Reject H 0 if P(X ,;;; x) < !a% number of m1spnnts has increased. c1oc1:-+-c~~
0.9661
i.e. if x lies in the lower tail i.e. if x lies in the upper tail or if P(X ~ x) < !a%, NOTE: by further calculation you will find that 12 0.9840
a% of the distribution a% of the distribution i.e. if x lies in the lower or P(X;;, 11) ~ 6.68% > 5%, so the boundary for the critical region l3 0.9929
upper tail !a% of the ~~~1e~ between 11 and 12. The critical region is therefore X;;, 12 14 0.9970
distribution t e null hypothesis will be rejected if 12 or more misprints are found in the sample.
X- P0 (6.5}
• Obtain your test value, x.
,. 5% boundary
• Calculate the required probability to see whether x lies in the critical (rejection) region.
" Make your conclusion by rejecting H 0 or not. Then relate this to the context of the
situation being tested.
When A is large, a normal approximation to the Poisson distribution can be applied. The test
is similar to the one for normal approximation to the binomial described on page 528.

0
I I
2 3 4 5 6 7 8 9
I IL
1011@ 13 14 15 16 17
Example 10.4
The number of misprints in the classified advertisements pages of the Daily Informer is found
to have a Poisson distribution with average 6.5 misprints per page. A new proof reader is
employed and the number of misprints on a page was found to be 12. The editor said that the
average number of misprints had increased. Test this claim at ~he 5°/o level.
498 /' c:cr-Jr_:!Si: CC,:!Jf\3t- if!,'\ ll-\/'

(c) State the significance level of the test


Example 10.5
(d) If, in fact, the treatment has reduced .the b f fl
The number of breakdowns per week of an office photocopier can be modelled by a Poisson b b.!. f
pro a 1 tty o making a Type II error when
num er o aws to
1. h one per metre, find the
distribution with mean 4.5. The photocopier was serviced and during the next week it broke app ymg t e test described above.
down just twice. There is no evidence, at the 10% level, of an improvement in the reliability
Solution 10.6
of the photocopier.
See page 493 for the definitions of Type I and Type II errors.
Solution 10.5 (a) Let X be the number of flaws in a 4- 1 h .
randomly, X~ Po(;!.). metre engt of fabnc. Assuming that the flaws occur
j. Lktin~- the Let X be the number of breakdowns in a week, where X~ Po(lc).
In a 4-metre length, the expected number of flaws i . h
H 0:;i.=4.5 significance test would be · s etg t. The hypotheses for the
if'.
H ;1. < 4.5 (the number of breakdowns has decreased implying that the
1
photocopier is more reliable) · H 0 : A= 8
H,: ;1. < 8 ~he :ean number of flaws in 4 metres is less than eight and the treatment
If H 0 is true, then X~ Po(4.5). as een successful m reducmg the number per metre.)
ac~-~1rdi~H~ tu i 1,,. (b) If H 0 is true, then X~ Po(8).

4. Si~Hc level ami Since the alternative hypothesis is !c < 4.5, use a one-tailed test and consider the you are told that values of x < 5 are in the critical region So a P(X < r) where X- Po(8)
lower tail of the distribution for the critical region. test value of x < 5 would lead to the null hypothesis being.
A= 8.0
reJected.
At the 10% level, the sample value x will lie in the critical region if
r= 0 0.0003
rcjccciu11 P(X<;;x) < 10%. P(Type I error) = P(reject H 0 when H 0 is true)
1 0.0030
c'L'i\l'nun. Reject H 0 if P(X ( x) < 10%, where xis the test value. = P(X < 5 when X~ Po(8))
2 0.0138
6. C1kd.1u' thL- The test value is 2, so find P(X <;; 2). P(X < r) where X- Po(4.5) From tables 3 0.0424
Using cumulative Poisson tables (page 64 7) A =4.5 4 0.0996
n:qu:t·L-d P(X < 5) = P(X ,;;; 4)
j)l'UiJJbi!ity.
P(X ,;;; 2) = 0.1736 = 17.36% r= 0 0.0111 = 0.0996 5 0.1912
1 0.0611 = 10%
NOTE: if you do not have access to tables, then
calculate P(X <;; 2) as follows (see page 292): 2 0.1736 P(Type I error)= 0.10 (2 d.p.)
3 0.3423
P(X,;;; 2) =P(X = 0) +P(X = 1)+P(X =2) 0.5321
4
4 2 b bT .
~~ )
(c) The significance level of the test is the same as th
45 5 0.7029 the significance level is 10%. e pro a 1 tty of makmg a Type I error, so
=e- (1+4.5+
6 0.8311
= 0.17357 ... (d) If, in fact, the number of flaws per metre has been red d
of flaws in a 4-metre length is four Th h h b uce . to 1, then the expected number
= 17.36% · e ypot eses ecome
Since P(X <;; 2) > 10%, the test value of two breakdowns does not lie in the H 0 : ;t = 8
7. fvl.:lke yolll P(X <r) where X- Po(4)
CllllciUSillll, critical region, so H 0 is not rejected. HI: A= 4 A=4.0
There is no evidence, at the 10% level, of an improvement in the reliability of
P(Type II error)= P(accept Ho when HI is true)
the photocopier. r=O 0.0183
y .
ou reJect Ho when x < 5, so accept Ho when x;;;;. 5. 1 0.0916
2 0.2381
P(Type II error)= P(X ) 5 when ;1. = 4)
Example 10.6 3 0.4335
=1-P(X <:; 4whenX ~ Po(4))
The number of flaws per metre of fabric follows a Poisson distribution with mean 2. With the 4 0.6288
aim of reducing the number of flaws, the fabric is subjected to a different treatment process. = 1-0.6288 5 0.7851
After this treatment a significance test is devised to gauge whether it has been successful. The = 0.3712 6 0.8893
test states that the number of flaws has decreased if a randomly selected 4-metre length of = 37%
cloth contains fewer than five flaws.
The probability of making a Type II error is approximately 37%.
(a) State the null and alternative hypotheses for this significance test.
(b) Find the probability of making a Type I error when the test is carried out.
500
T 'I I 501

Now consider the value of the test statistic.


1. The number of white corpuscles on a slide has a 5. For each of the following, an observation, x, is
Poisson distribution with mean 3.5. After taken from a Poisson distribution, where 6. Perform any calculations necessary to find out whether the test statistic is in the critical
treatment, a sample was taken and the number X -Po(.l). regwn.
of white corpuscles was found to be 8. Test at Test the hypotheses at the level of significance
the 5% level of significance, whether the number stated. 7. Make your conclusion:
of white corpuscles has increased. - If the test statistic is in the critical region, reject H 0 in favour of H 1 •
Level of
2. The number of telephone calls to an office on a Hypotheses significance - If the test statistic is not in the critical region, do not reject H •
X
weekday follows a Poisson distribution with a 0
mean number of six per hour.
(a) 11 H 0 : .l ~ 7 5% Then relate this to the context of the situation being tested.
(a) On Monday there were 5 calls between
H 1 :Jc>7
10.00 a.m. and 10.30 a.m. Test, at the 5%
level, whether the number of calls has (b) 12 H 0 : .l ~ 7 5%
increased. H 1:.l*7
(b) On Wednesday there were 3 calls between
(c) 4 H0 :.l~1o 1% Test decision
11.00 a.m. and 12.30 p.m. Test, at the 5%
level whether the number of calls has H 1:.l<10 Accept H 0 Reject H 0
decreased. (d) 18 H 0 : .l ~ 10 5% Actual H 0 is true ./ Type I error
3. Over a long period of time, Jane has found that H 1:.l>10 situation H 0 is false Type II error ./
the bus taking her to school arrives late on (e) 2 H 0 : .l ~ 6.5 5%
average 9 times a month. In the month following
the start of the new summer schedules, Jane finds *
H 1: .l 6.5 P(Type I error)~ P(reject H 0 given that H 0 is true)
that her bus arrives late 13 times. (f) 2 H 0 : .l ~ 6.5 5% ~a% (where a% is the significance level)
Assuming that the number of times the bus is late H 1: .l < 6.5
can be modelled by a Poisson distribution, test, P(Type II error)~ P(accept H 0 given that H 0 is false)
at the 5% level of significance, whether the new ~ P(accept H 0 given that H 1 is true)
schedules have in fact increased the number of 6. In a particular city it was found, over a period of
times on which the bus is late. State clearly your time, that X, the number of cases of a certain
null and alternative hypotheses. (L) medical condition reported in a month, has a
Poisson distribution with mean 3.5. During the
4. A single observation is to be taken from a month of August, seven cases were reported.
Poisson distribution with mean A and used to test
the null hypothesis A= 8 against the alternative
Stating a necessary assumption, perform a Miscellaneous worked examples
significance test at the 5% level to decide
hypothesis A< 8. whether or not this number of reported cases
The critical region is chosen to be x.;;; 3. suggests that the number of occurrences of the Example 10.7
(a) Find the probability of making a Type I medical condition has increased. State your
error. hypotheses and conclusions clearly. When I used to play darts regularly I scored a bull's-eye on average on 40% of attempts. After
(b) Find the probability of making a Type II
error if, in fact, A= 6. a break of three months, I play darts one evening and score two bull's-eyes in 12 attempts. I
wtsh to test whether the percentage of attempts on which I score a bull's,eye has decreased.
(a) Stating a necessary assumption, use an exact binomial distribution to carry out the test
using a 10% significance level. '
The following summary shows the stages of the hypothesis test. For details relating to the (b) Comment on the validity of the assumptions made in (a). (C)
particular distributions, see page 492 for the binomial test and page 496 for the Pmsson test:
1. State the variable being considered. Solution 10.7
2. State the null hypothesis H 0 and the alternative hypothesis H,. (a) Let X be the number of bull's eyes scored in 12 attempts.
If you are looking for Assuming that the result of an attempt is independent of the results of all other attempts,
an increase, then H 1: ... > ... (one-tailed test, upper tail) X can be modelled by a binomial distribution, where X- B(12, p).
a decrease, then H 1: ... < ... (one-tailed test, lower tail) H 0 : P ~ 0.4 (I score a bull's eye on 40% of the attempts.)
a change, then H 1: ... * ... (two-tailed test, upper and lower tails) H,: P < 0.4 (The percentage of attempts on which I score a bull's eye has decreased.)
3. Consider the appropriate distribution if the null hypothesis is true. If H 0 is true, then X- B(12, 0.4).
4. Decide on the significance level of the test. This fixes the critical (rejection) region. Carry out a one-tailed (lower tail) test at the 10% level.
5. Decide on your rejection criterion. Reject H 0 if P(X <: x) < 10% where xis the test value.
Ti(":f JS 503

The test value is x = 2. Example 10.9


Now P(X <; 2) = P(X = 0) + P(X = 1) + P(X = 2)
= 0.6 12 + 12 X 0.6 11 X 0.4 + 12 C2 X 0.6 10 X 0.4 2 A manufacturer of windows has used a process which roduc d .
= 0.08344 ... a rate of 0.5 per m2. In an attempt to reduce the numb~r of fl e flaws m the glass randomly at
=8.3% tned out. A randomly chosen window p d d . h. aws produced, a new process is
and contains only one flaw. ro uce usmg t Is new process has an area of 8m2
(You can use cumulative binomial tables if they are available.)
(a) Stating your hypotheses clearly test at the 10')( le
1 f . ..
Since P(X <: 2) < 10%, the sample value x = 2lies in the critical region, so H 0 is rejected in rate of occurrence of flaws usi;g the od veho stgruflcance whether or not the
favour of H 1. At the 10% significance level, there is evidence that the percentage of new proce ure as decreased.
attempts on which I score a hull's eye has decreased. The new procedure actually produces flaws at a rate of 0.3 pe 2
(b) One usually improves when making several attempts, so it is unlikely that the attempts are (b) Find the probability of making a Type II error using the te:t~n.part (a). (L)
independent. If they are not independent, then a binomial model is not suitable and the
test is not valid. Solution 10.9

Let X be the number of flaws in an 8 m 2 wind ow. Then X - p o(A).


Example 10.8 (a) H 0 : A =4

A die is suspected of bias towards showing more sixes than would be expected of an ordinary
H,: A< 4 (the number of flaws has decreased)
die. In order to test this, it is decided to throw the die 12 times. The null hypothesis p = ~, If H 0 is true, then X- Po(4).
where p is the probability of the die showing a six, will be rejected in favour of the alternative Use a one-tailed test and consider the lower tail for the crit' 1 .
hypothesis p > ~if the number of sixes obtained is 4 or more. Calculate, to three decimal At the 10% level the value x = 1 will be in the critical . lc~ regwn.
places, the probability of making reject H 0 if P(X.;; 1) < 10%. regwn If P(X <: 1) < 10%, therefore

(a) a Type I error, P(X <: 1)=P(X=0)+P(X= 1)


(b) a Type II error if, in fact p = !- (C) =e-4(1 +4)
= 0.0915 ...
Solution 10.8 Since P(X <; 1) < 10%, reject H 0 in favour of H .
1
(a) Let X be the number of sixes obtained when the die is thrown 12 times. The rate of occurrence of flaws using the new procedure has decreased.
Then X- B(12, p). (b) The hypotheses now become
H 0 : p = J: (The die is fair) H 0 : A=4
H 1: p > t (The die is biased in favour of sixes) H 1: A= 2.4
P(Type I error)= P(reject H 0 when H 0 is true) If H 1 is true, then X- Po(2.4).
If H 0 is true, then X - B(12, l;J. P(Typ? II error)= P(accept H 0 when H 1 is true)
Also H 0 is rejected if x :> 4, The cntical regwn is x <; 1, so you would accept H if x > 1
0
P(Type II error)= P(X > 1 when X- Po(2.4)) ·
so P(Type I error) = P(X :> 4 when X - B(12, i;J)
= 1- e-2 .4(1 + 2.4)
= 1- P(X < 4)
10 2 12 9 = 0.6916
= 1- CCil + 12 X Cil X 1; + c, X CiJ X (j;) + C, X (~) (1;) 3)
12 11 12
X
=69%
= 0.1251 ...
= 0.13 (2 s.f.)
(b) The hypotheses now become
Ho: P = i
HI: p = i
P(Type II error)= P(accept H 0 when H 1 is true)
If H 1 is true, then p = l: and X - B(12, l:l
H 0 is accepted if x < 4,
so P(Type II error) = P(X < 4 when X - B(12, l:Jl
12 12 12 12 12 12
= (l:) + 12 X Cl:l + C, X (!) + C,Cl:l
= 0.073 (2 s.f.)
T< i J' ! f

(c) (i) Comment on your answer to part (a). 9. It is known that the number of defects in a
Miscellaneous exercise lOc- Binomial and Poisson tests (ii) Suggest an improvement to the one-metre length of steel pipe has mean 2.4. It
4. The ABC School of Motoring claim that at least procedure used by the consumer group. has been suggested that a Poisson distribution
1, Before I sat an examination, my teacher told me (NEAB) would be a reasonable model for the number of
80% of their pupils pass the driving test first
that I had a 60% chance of obtaining a grade A, defects in a randomly chosen one-metre length of
time. The XYZ School of Motoring suspect that
but I thought I had a better chance than ~hat. 7 A firm producing mugs has a quality control this steel pipe.
more than20% of ABC's pupils fail first time.
In preparation for the examination, we dtd seven scheme in which a random sample of 10 mugs
They test this suspicion by checking the results of {a) State two assumptions that would need to
tests each of the same standard as the from each batch is inspected. For 50 such samples,
a random sample of 25 former ABC pupils, be made for a Poisson distribution to be an
examination. Assuming my teacher is right, find the numbers of defective mugs are as follows:
finding out how many failed first time. appropriate model in this case.
the probability that I would get a grade A on
(a) State suitable null and alternative Number of (b) Using this Poisson model, calculate the
(a) all 7 tests, hypotheses to be used in the test. probability that in a randomly chosen
(b) exactly 6 tests out of 7, defective mugs 0 1 2 3 4 5 6+ one-metre length of steel pipe there are:
(b) Identify the model that should be us.ed for
(c) exactly 5 tests out of 7. the distribution of the number of fat!ures. (i) exactly 3 defects,
Number of (ii) more than 3 defects.
In fact I got a grade A on 6 tests out of 7. State (c) Find the smallest number of failures which
suitable null and alternative hypotheses and would allow ABC's claim to be rejected at samples 5 13 15 12 4 1 0 (c) Determine the probability that there arc
carry out a statistical test to determine ~hether the 5% level of significance. (NEAB) exactly 6 defects in a randomly chosen two-
or not there is evidence that my teacher ts (a) Find the mean and standard deviation of the metre length of the same type of steel pipe.
underestimating my chances of a grade A. (MEI) 5. For most small birds, the ratio of males to number of defective mugs per sample. (d) It is believed that the manufacturing process
females may be expected to be about 1 : 1. In. one (b) Show that a reasonable estimate for p, the may now be producing more defects than
2. Harry Hotspur is a footballer who likes to take ornithological study birds are trapped by settmg probability that a mug is defective, is_0.2. before. In a quality control experiment a
penalty kicks. On past performance he reckons fine-mesh nets. The trapped birds are counted Use this figure to calculate the probability one-metre length of the steel pipe is chosen
that on average he scores 7 times out of 10. and then released. The catch may be regarded as that a randomly chosen sample will contain and is found to have 7 defects. Test, at the
Assume that Harry is correct, and consider the a random sample of the birds in the area. exactly 2 defective mugs. Comment on the 5% level of significance, the hypothesis that
next 8 penalty kicks he takes. The ornithologists want to test whether the sex agreement between this value and the the number of defects in this type of steel
(a) Find the probability that he will score at ratio of blackbirds is, in fact, 1: 1. observed data. pipe has increased. State your hypotheses
(a) Assuming that the sex ratio of blackbirds is clearly. (0)
least 6 times. The management is not satisfied with 20% of
(b) Find the modal score and state its 1 : 1, find the probability that a random mugs being defective and introduces a new process 10. (a) The number, X, of breakdowns per day of
probability. . sample of 16 blackbirds contains to reduce the proportion of defective mugs. the lifts in a large block of flats has a
(c) What further assumption have you made m (i) 12 males (ii) at least 12 males Poisson distribution with mean 0.2. Find, to
(c) A random sample of 20 mugs, produced by
calculating the probabilities in (a) and (b)? (iii) at least 12 of the same sex. three decimal places, the probability that on
the new process, contains just one which is
After a period of intense practice, Harry reckons (b) State the null and alternative hypotheses the a particular day
defective. Test, at the 5% level, whether it is
that he has improved his penalty taking. ornithologists should use, clearly indicating (i) there will be at least one breakdown,
reasonable to suppose that the proportion of
why the alternative hypothesis takes the (ii) there will be at most two breakdowns.
(d) Write down suitable null and alternative defective mugs has been reduced, stating
form it does. your null and alternative hypotheses clearly. (b) Find, to three decimal places, the probability
hypotheses for testing the value of p, the
probability that Harry scores from a penalty In one sample of 16 blackbirds there are 12 (d) What would the conclusion have been if the that, during a 20-day period, there will be
males and 4 females. management had chosen to conduct the test no lift breakdowns.
kick. (c) The maintenance contract for the lifts is
(c) Carry out a suitable test using these data at at the 10% level? (MEl)
He takes 15 penalty kicks and scores from 13 of the 5% significance level, stating your given to a new company. With this company
conclusion clearly. Find the critical region 8. In a certain country, 90% of letters are delivered it is found that there are two breakdowns
them.
for the test. the day after posting. over a period-of 30 days. Perform a
(e) Carry out the hypothesis test, at the 10%. A resident posts 8 letters on a certain day. significance .test at the 5% level to decide
level of significance, stating your concluslOn (d) Another ornithologist points out that,
because female birds spend much time Find the probability that: whether or not the number of breakdowns
clearly. sitting on the nest, females are less lik~ly to has .dccfeased.· (L)
(f) Harry takes a further set of 15 penalty (a) all 8 letters are delivered the next day,
kicks. Out of the total of 30 kicks he scores be caught than males. Explain how thts (b) at least 6letters are delivered the next day, 11. The number, X, of emergency telephone calls to
from 26. Without further calculation would affect your conclusions. (MEl) (c) exactly half the letters are delivered the next a gas board office in t minutes at weekends is
explain carefully whether this additional day. known to follow a Poisson distribution with
6. Over many years it has been found that at a
information strengthens Harry's case or not. particular station 20% of trains arrive late. A It is later suspected that the service has mean <Jot. Given that the telephone in that office
(MEl)
consumer group wishes to test wheth~r the deteriorated as a result of mechanisation. To test is unmanned for 10 minutes, calculate, to two
percentage of trains arriving late has mcreased this, 17 letters are posted and it is found that significant figures, the probability that there will
3. The manufacturers of a certain type of recently. It decides to observe 20 ~rai~s. If J?ore only 13 of them arrive the next day. Let p denote be at least 2 emergency telephone calls to the
microwave oven claim that at least 95% of their than four of the trains arrive late 1t wtll clatm the probability, after mechanisation, that a letter office during that time.
ovens will not fail during the first two years of that the percentage of trains arriving late has is delivered the next day. Find, to the nearest minute, the length of time
use. In order to test this claim, a Consumer that the telephone can be left unmanned for there
increased. (d) Write down suitable null and alternative
Agency purchased a random sample of 15 ovens to be a probability of 0.9 that no emergency
and ran them under similar conditions over a (a) In the case where the percentage of trainsh hypotheses for the value of p. Explain why telephone call is made to the office during the
. . late has remame
arnvmg · d at 20"'to, find It e the alternative hypothesis takes the form it
two-year period. It was found that 12 ovens had period the telephone is unmanned.
probability that the consumer group rna <es does.
not failed during that period. During a week of very cold weather it was found
Test the manufacturer's claim using an exact a Type I error. . (e) Carry out the hypothesis test, at the 5% level that there had been 10 emergency telephone calls
binomial distribution. The significance level {b) In the case where the percentage of t~·adnsh of significance, staring your results clearly. to the office in the first 12 hours of the weekend.
should be as close as possible to 5%. arriving late has increased to 25%, ftn t e (f) Write down the critical region for the test, Using tables, or otherwise, determine whether
. . t hat t he consumer group ma 1<es
probabthty giving a reason for your choice. (MEI)
Explain why an exact 5% significance level is not the increase in the average number of emergency
possible. (C) a Type II error. telephone calls to that office is significant at the
5% level. (L)
Mixed test lOA (Binomial)
1. The random variable, R, can be modelled by a A student complained that this sample did not
binomial distribution with parameters n = 10 and give a true picture of the effectiveness of the new
p, whose value is unknown. procedure.
Find the critical region for the test of
(b) Explain briefly why the student's claim
H 0: p = 0.5 againstH 1 : p * 0.5 might be justified and suggest how a more
effective check on the new procedure could
at the 10% level of significance. (NEAB) be made. (L)
2. A large college introduced a new procedure to 3. An enthusiastic gardener claimed that she could
try to ensure that staff arrived on time for the never work in the garden at the weekend because
start of lectures. A recent survey by the students 'It always rains on Saturday and Sunday when
had suggested that in 15% of cases the staff
arrived late for the start of a lecture. In the first
I'm at home and it's always fine on weekdays
when I'm not!' She noted the weather for the
Hypothesis testing (z-tests and t-tests)
week following the introduction of this new next month and recorded that, out of 10 wet
procedure a random sample of 35 lectures was days, five were either a Saturday or a Sunday.
taken and in only one case did the member of The gardener's claim may be modelled by
staff arrive late. regarding her observation as a single sample
In this chapter you will
(a) Stating your hypotheses clearly test, at the from a B(10, p) distribution. Given that one
5% level of significance, whether or not would expect 2 out of every 7 wet days to be • be reminded about the language of hypothesis (significance) testing introduced in Chapter 10
there is evidence that the new procedure has either a Saturday or a Sunday, the null
hypothesis, p = ~' may be tested against the • be reminded about Type I and Type II errors
been successful.
alternative hypothesis, p > ~·Carry out a • learn how to perform the following hypothesis tests:
hypothesis test to test her claim at the 10%
significance level. {C) Test 1: Testing ft, the mean
la: of a normal distribution with known variance, any size sample (z-test)
lb: of a distribution with known variance, large sample (z-test)
lc: of a distribution with unknown variance, large sample (z-test)
Mixed test lOB (Poisson) ld: of normal distribution with unknown variance, small sample (1-test)
1. The mean number of serious accidents at a (ii) Calculate the probability that, in two Test 2:Testing p, the proportion of a binomial distribution, n large (z-test)
motorway interchange is 2.1 per week. such dishes, the total number of
(a) State the probability distribution which may bacterial colonies that develop will be Test 3:Testing 111 -112 , the difference between means of two normal distributions
reasonably be used to model the weekly between 10 and 20 inclusive. 3a: when population variances are known (z-test)
number of serious accidents at this (b) Experiments were conducted to determine
the effectiveness of an antibiotic spray in
3b: when there is a known common population variance (z-test)
motorway interchange, and give its 3c: when the common population variance is unknown,
parameter. reducing the number of bacterial colonies
that develop. large samples (z-test)
(b) Usc an appropriate distribution to determine
the probability that he number of serious In one experiment in which one dish was - small samples {t-test)
accidents is: sprayed, the number of bacterial colonies
{i) two or fewer in a randomly selected that developed was 3. Stating suitable null Background knowledge:
week: and alternative hypotheses, determine
(ii) exactly one on a randomly selected day. whether or not this result provides For the z-tests you will need to be familiar with
significant evidence at the 5% level that the
(c) Given that there were 6 serious accidents
spray is effective. {NEAB)
the normal distribution and the use of the standard normal tables (see page 3 62)
during one wet winter week, test, at the 5% the distribution of the sample mean (see page 436)
level of significance, the hypothesis that the
accident rate is higher in wet weather. (0)
3. A single observation is taken from a Poisson the unbiased estimate for the population variance (see page 447)
distribution with mean p and used to test the the normal approximation to the binomial distribution (see page 382)
2. (a) The number of bacterial colonies that hypothesis fl = 6 against the alternative
hypothesis ft > 6. For the t-tests you will need to be familiar with
develop in dishes of nutrient exposed to an
infected environment has a Poisson The critical region is chosen to be x ;-; , 11. the use of the t-distribution tables (see page 463)
distribution with mean 7.5. (a) At what significance level is the test carried
{i) Calculate the probability that, in one out? . II
such dish, the number of bacterial (b) Find the probability of makmg a Type HYPOTHESIS TESTING
colonies that develop will be greater error if, in fact, p = 8.5.
than 10.
If you have worked through Chapter 10 you will be familiar with the terminology and
~ethods used to carry out hypothesis tests relating to discrete distributions. For those new to
t e t?prc, these are described again in the following text, but this time in relation to
continuous distributions. The following example illustrates the hypothesis test for the mean of
a normal distribution.
11111111111111111111111111111111
5os · c: u rrc;: s ~ : , rH s r. r r:: rr v : •vc : ··llllll..............................------------~q;~~~------------------................lllllllllllllllllllllllllllllllllllllllllllllllllll
1-!\'l'C:·r,,,-- (~l!f\!C1' _-;-l<l"i"S/:.fJC!f l-iSTS) 509

In the production of ice packs for use in cool boxes, a machine fills packs with liquid and the The result of the test depends on the h b · h · · ·
of 524 9 I h I w erea outs m t e samplmg drstnbution of the test value
packs are then frozen. Since space is needed in the packs for the liquid to expand, it is . m' t e mean vo ume of the sampl f 50 I k b
need to find out wheth . 524 9 . I eo pae<s ta en y the supervisor. She would
important that they are not over-filled. The volume of liquid in the packs follows a normal er . rs c ose to 524 or far away from 524.
distribution with mean 524 ml and standard deviation 3 mi. If it is close to 524 then it is likely to have come from a
The machine breaks down and is repaired. In the next batch of production, there is a drstnhutron with mean 524 ml and there would t b
"d nn e
suspicion that the mean volume of liquid dispensed by the machine into the packs has ~nough evt ence to say that the mean volume has
increased and is greater than 524 ml. In order to investigate this, the supervisor takes a Increased.
random sample of 50 packs and finds that the mean volume of liquid in these is 524.9 mi. If it is far away from 524, i.e. in the right-hand (upper
Does this provide evidence that the machine is over-dispensing? tar!) of the drstnbutwn, then it is unlikely to have come
from a distribution with a mean of 524 mi. The mean is 524 524.9
The mean volume of the sample, 524.9 ml, is higher than the established mean of 524 mi. But
(close to' 524)
is it high enough to say that the mean volume of all the packs filled by the machine has likely to be higher than 524 mi.
increased? Perhaps the mean is still 524 ml and this higher value has occurred just because of Note that the upper tail is being used because the
sampling variation. A hypothesis (or significance) test will enable a decision to be made that is supervisor suspects that there is an increase in Jt. This
backed by statistical theory, not just based on a suspicion. type of test is called a one-tailed (upper tail) test.
Let X be the volume, in millitres, of liquid dispensed into a pack after the machine has been A decision need~ to be taken about the cut off point, c, ~~~./~---~----*'"-~
repaired and let the mean of X be ft, where p is unknown. Assuming that the standard known as. the cntlcal value, which indicates the boundary X: 524 524.9
deviation remains unchanged, X~ N(ft, a 2 ) with a= 3. of the regwn where values of X would be considered to be (far away from 524)

The hypothesis is made that I" is 524 ml, i.e. the mean has remained the same as it was prior too far away from 524 ml and therefore would be
to the repair. This is known as the nnll hypothesis, H 0 and is written unlikely to occur. The region is known as the critical region or rejection region.

H 0 : p=524 The critical value and region are fixed using probabilities linked to the significance level of the
tes_!: In gener:l, for an up~~r tail t~st ~t the a% level, the critical value cis fixed so that
Since it is suspected that the mean volume has increased, the alternative hypothesis, H 1 , is that P(X > c)= a Yo and the cntrcal (reJectwn) region is x > c.
the mean is greater than 524 mi. This is written
HI:,,> 524
To carry out the test, the focus moves from X, the volume of liquid in a pack, to the
distribution of X, the mean volume of a sample of 50 packs. In this test, X is known as the
test statistic and its distribution is needed. The
distribution of X is known as the sampling ,, c
distribution of means. critical region, X> c
{rejection)
In Chapter 9 you saw that if X~ N(p, a 2
),
The hypothesis test involves finding whether or not the sample value ·x- Ir"es · th ·· 1
- ( a')
then, for samples of size n, X ~ N p, -;; . X:
re f h 1· d' . .
gwn o t e samp mg 1stnbut10n of means, X.
_ . , , m e cntiCa

In this examPIe, t"f-li


x es tn· t h e cntlca
· · I regmn,
· then a decision is taken that it is too f
The hypothesis test starts by assuming that the value stated in the null hypothesis is true, so from 524 1 h f . ar away
. m to ave come rom a distribution with this mean. In statistical language ou
f' = 524. :ouldhreJect the null hypothesis, H 0 (that the mean is 524 ml), in favour of the alter~:rive
Since a= 3 and n =50, ypot esrs, H 1 (that the mean is greater than 524 ml).
If.xdoes not rIe m
· the cnttca
· · 1region, there is not have enough evidence to re ·ect H H ·
X~ N(524, ;~). X~ N(524,0.18). accepted · In th"IS examp Ie, x- < . k nown as the acceptance region. 1 o' so o rs
i.e. cIS

:~r a sig n~fica?ce level ~f ~~'if the sample mean lies in the critical (or rejection) region then
The sampling distribution of means, therefore, follows a normal distribution with mean 1
e resu t Is satd to be Significant at the a% level. '
524 ml and variance 0.18 ml2 • The standard deviation is '1/0.18 mi.
I if a resu JtIS
NoteI that
any · signt
· ·f·Kant at, say, the 1% level, then it is automatically significant at
a 3 eve greater than 1%, for example So/o or 10%.
NOTE: This is sometimes left in the form =.
= •r = .
vn v50 Say! th(at the supervisor chooses a significance level of 5%. She will then re1·ect H if the test
d" ue.b r·:
va e the m ean vo Iume of t h e sampIe of 50 cans) lies in the upper tailS%
. ofo the
tstn utton of sample means.
Since this distribution is normal, instead of finding c, the critical X value, it is possible to work
in standardised values and find the z-value that gives 5% X~ N(524, :~)
in the upper tail. Using standard normal tables (page 649),
if P(Z > z) ~ 0.05 <P(z) = 0.95
Z- NIO. !)
so P(X > 524 _9 ) ~ p(z > 524.9- 524)
3/m
boundary for 5%
v'
then P(Z < z) ~ 1- 0.05 ~ 0.95 5%
~ P(Z > 2.1213 ... )
I.e. <!>(z) ~0.95 1.7%
~ 1-0.9831
z ~ <1>- 1(0.95) 0 1.645
~ 0.0169 X: 524 524.9
~ 1.645 critical region ~ 1.7%

So z-values that are greater than 1.645 lie in the upper tail 5% of the distribution. This probability is less than 5% im 1 in h th b d ·· ·
the left f th 1 I f , p y g t at e oun ary for the Cnt!cal regwn must lie to
0
e samp eva ue o 524.9 and confirming that 524.9 lies in the critical re i h"
methfod also tbells you that the test value of 524.9 will lie in the critical region for angy re~eTo~s
This enables a statement to be made, known as the rejection criterion, which tells you when to
reject the null hypothesis: sigm Icance a ove 1. 7%.
Reject H 0 if z > 1.645, where z is the standardised value of the mean of the sample of 50
This probability method ~an be used, if preferred, in the hypothesis test to find whether the
packs, sample value hes m the reJeCtiOn region. In this example , for a so' 1 1 f · ·f· h
. . . • /o eve o sigm Icance t e
x-524 reJectwn cntenon would be to reject H o if P(X > x) < 0 •05 , w h ere x- Is
· t h e sample mean.
'
I.e.
3/m
Note that to avoid being influenced by sample readings, it is important that the rejection ONE-TAILED AND TWO-TAILED TESTS
criterion is decided upon before any sample values are taken.
When the sample was taken, it was found that

so z
524.9-524
x ~ 524.9,
\ \ 5%
Say that the null hypothesis is 1, ~ l'o·
In a one-tai.led test, the alternative hypothesis HI looks for an increase or a decrease in Jt:
for an Increase, Ht is Jl > Jto and the critical region is in the upper tail,
3/m
~ 2.12 (2 d.p.) z, 0 1.645
1\
""'
\_-
test value

_)
z = 2.12

The result of the test is now stated in statistical terms and then related to the context of the
test, as follows:
Since z > 1.645, H 0 is rejected in favour of H 1 . The supervisor would conclude that the mean critical region
volume of liquid being dispensed by the machine is not 524 ml, but has increased, she would for a decrease, HI is Jl < Jlo and the critical region is in the lower tail.
be wise therefore to stop production so that the setting on the machine could be adjusted.
Note that the critical x-value, c, can be found by de-standardising the critical
z-value of 1.645, where

c- 524 1.645
3/m
3 i<o
c~524+ 1.645 x ·= critical region
~so
~ 524.7 ~h:t~wo-tailed test, the alternative hypothesis H, looks for a change in!' without specifying
X: 524 524.7 '\ er It IS an mcrease or a decrease and H1 is Jt *flo· The critical region is in two parts:
test value
Since the test value of 524.9 is greater than 524.7, X"' 524.9
it lies in the critical region, confirming the result
obtained above.
If you want even more information, you can find out exactly where the sample mean lies in
the distribution of X. Note that this is the method used in Chapter 10 for discrete variables.
i<o
critical region critical region
!-!"YT'Ori-I[S!S Tt-:STiNG (.7--Tf~STS PJ\JD •'-I cSTSI 5!3

CRITICAl z-VAlUES SUMMARY OF CRITICAl VAlUES AND REJECTION CRITERIA


Critical values depend on the significance level and also whether the test is one- or two-tailed. The summary below shows the critical z-values and re'ectio . . \
The method of working in standardised values is widely used for tests involving the normal used levels of significance: 10%, So/o and 1% J n cntena for the most cotJmonly
distribution because the critical z-values can be found easily from standard normal tables, as
described on page 529. Sometimes the most commonly needed values are summarised in a One-tailed test One-tailed test Two-tailed test
critical value table. One such table is shown below and it is also printed in the appendix at the (lower tail) (upper tail)
bottom of page 649. It gives the z-values for various values of p, where p = P(Z < z) = <l>(z). Ho:ll=llo Ho:fl=flo Ho:p.=po-
H1 :ft<flo HI =P- >flo
p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995 H, 'll*llo
10% significance level Reject H0 if z < 1.282 Reject H0 if z > 1.282 Reject H 0 if z < -1.645
z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
or z> 1.645
For example, for a one-tailed test, at 1% level: you want to find z such that <l>(z) = 0.99, so (written Iz I> 1.645)
look up p = 0.99. 5% significance level Reject H0 if z < 1.645 Reject H 0 if z > 1.645 Reject H 0 if z < -1.96
From the table, z = 2.326. Therefore the upper tail critical value is 2.326. By symmetry, the
or z> 1.96
lower tail critical value is -2.326.
(written Iz I > 1.96)
1% significance level Reject H 0 if z < 2.326
Z-N(O,l) Reject H 0 if z >2.326 Reject H0 if z< -2.576
99%
or z>2.576
(written Iz I> 2.576)

1% 1%
STAGES IN THE HYPOTHESIS TEST
0 2.326 z. -2.326 0

:~:~i~f{~~:g out a hypothesis test, it is useful to work through the following stages. This is
For a two-tailed test, at the 1% level: the 1% in the tails is split evenly between the upper and in Chapt:r 10. same procedure as m the tests for parameters of discrete distributions described
lower tails with 0.5% in each. There are two critical values.
To find the upper tail value, you need to find z such that <l>(z) = 0.995, so look up p = 0.995. 1. State the variable being considered.
From the table z = 2.576. 2. State the null hypothesis Ho and the alternative hypothesis Hl.
So the upper tail critical value is 2.576 and the lower tail value (by symmetry) is -2.576.
Remember that if you are looking for
a decrease
. '
then H t •· • • • < • •• (one-tat·1 ed test, 1ower tail)
Critical values:
an mcrease' then H 1 .• ... > ••• (one-tate .1 d test, upper tail)
99% a change' then * (
H 1 ·• ••• • •• two-tate ·1 d test, upper and lower tails)

3. ~onsider the distribution of the test statistic, assuming that the null hypothesis is true
0.5% 0.5% 0.5% you aretestmg a sample mean, then the test statistic is X, and the sampling distrJ'b t. ·
o f means IS considered.
z. 0 2.576 z. -2.576 0 2.576' 4
· State thfehtype of test (i.e. whether it is one-tailed or two-tailed) and decide the · ·f·
u IOn

1eve1 o t e test. stgm Icance


5. Decide
. . on your reJeCtiOn
· · C["_iterion,
· remembering that "11 . '
N hes m the critical (or rejection) region fixed by the sig~~f~c:c/~~~~- Ho If the test value
ow consider the value of the test statistic

6. ::;{~~~ any calculations necess~ry to find out whether the test value is in the critical
7. Make your conclusion in statistical terms: Does this provide evidence, at the 5% significance level, that trainees from this county did not
perform well as expected?
- If the test value is in the critical region, reject H 0 in favour of H 1•
- If the test value is not in the critical region, do not reject H 0 • Solution 11.1
Then relate your conclusion to the situation being tested.
The stages of the hypothesis test are shown in the margin and additional comments are given
m rtahcs.
There are several hypothesis tests involving continuous distributions and some of these are
illustrated in the following text. Let X be the mark of a trainee from the particular county and let the population
mean mark be fi·

~:~~ing that the standard deviation has not changed, then X- N(fi, a 2 ) with
HYPOTHESIS TEST 1: TESTING Jl (THE MEAN OF A POPULATION)
Consider a population X with unknown mean fl and variance a 2• H 0 : fi ~ 70 (The trainees have performed as expected)
A value for fl, call it f'o' is specified in the hypotheses, for example H 1 : fi < 70 (The trainees have not performed as well as expected)

Ho:fl~fio The test is c:arried out based on the value of the sample mean, x. The test
H, : fl< fio (or fi > fio or fi * fio) statistic is X and you need to consider the sampling distribution of means.

a~ 6 and n ~ 25.
2
To test the hypotheses, take a sample of size n from the population and calculate the sample For samples of size n, X- N(fi, : ) with
mean, X. The test statistic is X, and the sampling distribution of means is considered.
There are now several cases that may occur, depending on whether the population is normal You now use the value of fi given by the null hypothesis.
or not, whether the sample size is large or small and whether the population variance is
known or not. If His true, then p ~ 70, so X - N( 70, ~;} i.e. X - N(70, 1.44).
0

Note that the standard deviation is >'1.44 ~ 1.2.


Test la: Testing p when the population X is normal and the variance The standard deviation is sometimes left in its uncalculated form:
o 2 is known (any size sample)
Since the population is normally distributed, X- N(fi, a 2 ). The sampling distribution of
means, X is also normal for all sample sizes, with mean f'o (as specified in the null hypothesis).
4. · lw
When Use a one-tailed (lower tail) test at the 5% level.
size n, the test statistic is
Note that the test is one-tailed (lower tail) since you are looking for a decrease
mp.
X, where X,~
ll
You need to find out whether the sample mean of 673 (known as the test value)
In standardised the test statistic is lies in the critical region. To state your rejection criterion~ find the critical z-
value for the 5% lower tail. This is -1.645 (see page 513).
1;.
. H 1'f z < -1.645 where z~--~---.
R eJect X-f<o x-70
0
uj{;, 6/-fiS
The test value is x ~ 67.3, so
Example 11.1
67.3-70 5%
Each year trainees throughout the country sit a test. Over a period of time it has been kui:,_ti<.Jil z
6/-fiS
established that the marks can be modelled by a normal distribution with mean 70 and ~ -2.25 z, )' -1.645 0
standard deviation 6. -2.25
State the conclusion statistically (either reject H 0 , or do not reject H 0 ) and then
This year it was thought that trainees from a particular county did not perform as well as relate it to the context of the question.
expected. The marks of a random sample of 25 trainees from the county were scrutinised and
it was found that their mean mark was 67.3.
T
517

Since z < -1.645, H 0 is rejected in favour of H 1•


There is evidence, at the 5% level, that the trainees in this area have not performed as well as
If His true, then
0 f' = 100, so X- N( 100, ::)
expected. 3
Note that the standard deviation is_'!__= - - ( = 0. 75).
--{;;{16
NOTE 1:
To find the cnt1cal region, calculate the cntlcal x value, c, as follows: Performing a one-tailed (upper tail) test at the 1% level, you are told that H 0 is rejected.

70 1-\ This means that the sample mean, X must lie in the critical region, i.e. X must be greater than

/.\~.
c- =-1.645 the critical value, c.
6/m
c=70-1.645 x -=
6
~25
;/5% Working first in standardised values, the critical z-value
that gives 1% in the upper tail is z = 2.576 (see page 649).
.. c = 68.026 x. l 68.026 70 c-100
67 3 De-standardising to give c: 2.576 1%
So the critical region is x < 68.026. This means that any · 3/ru
test value less than 68.026 would result in the null hypothesis being rejected. 3 z. 0 2.576
C= 100 + 2.576 X .r:;-; x. 100 c = 101.932
~16 f-------?
NOTE2: c = 101.932 reject H0
If you prefer to use the probability method to decide whether x lies in the critical region, then,
since the significance level is 5o/o, the rejection criterion would be to reject H 0 if So the critical region is x > 101.932.
P(X < 67.3) < 0.05. Since the null hypothesis is rejected, the sample mean, x, is greater than 101.932.

Now
- (
P(X < 67.3)=P Z <
67.30- 70)
-=
6/~25
= P(Z < -2.25) Test lb: Testing p when the population X is not normal, the variance
= 0.013
=1.3%
a 2 is known and the sample size n is large
X: 67.3 70
Since P(X < 67.3) < 0.05, reject H 0 (as before). Z: -2.25 0 Since the population is not normal, you cannot say that the distribution of X is normal for all
sample sizes. If the sample size n is large, however, you can apply the central limit theorem
This method also tells you that H 0 would be rejected at any significance level above 1.3%.
(see page 442). This states that for large samples taken from a non-normal population, the
sampling distribution of means X is approximately normal, whatever the distribution of the
parent population.
Example 11.2
the mean of a. JWn-nnnna1 X -;,vith known
A sample of size 16 is taken from the distribution of X- N(/1, 3 2) and a hypothesis test is the
carried out at the 1% level of significance. On the basis of the value of the sample mean x, the
null hypothesis fl = 100 is rejected in favour of the alternative hypothesis fl > 100. the tc~t statistic is X, -;,vhcrc X is
What can be said about the value of x?
Jn 'iWn(Lirdiscd

Solution 11.2
the test statistic i,s / X Pu 1-vbcrc Z --
In this question you are being asked to find the critical region in terms of x. 11

It is given that X- N(fl, a 2) with a= 3.


Example 11.3
The hypotheses are H 0 : fl = 100
H 1 :fl> 100 The management of a large hospital states that the mean age of its patients is 45 years.
Records of a random sample of 100 patients give a mean age of 48.4 years. Using a
Considering the sampling distribution of means for samples of size n, population standard deviation of 18 years, test at the 5% significance level whether there is
evidence that the manage1nent's statement is incorrect. State clearly your null and alternative
X- N(fl, :)witha=3 andn= 16. hypotheses. (C)
T
\
Solution 11.3
Test lc: Testing the mean ,a of a population X where the variance o2
Let X be the age, in years, of a patient and let the population mean age be 11· is unknown and the sample size n is large
The population standard deviation a= 18.
2
H 0 : 11 = 45 (The management's claim is correct) The variance of the population, a , is unknown, but, as you saw in on page 44 7, an unbiased
estimate, 8 2 can be used instead,
H 1 :I'* 45 (The management's claim is incorrect)
n
where 8 = - - X s 2 (:-· iS rile· s:un:~lc \'.-li'!:l!IU:')
2
J. St:-ll<' rhc· You are performing a test based on a sample mean, Ji:, so you need to consider n-1
(iisi[ibu;iun d the sampling distribution of means, X.
Alternative formats:
The sample size is 100. Since n is large, by the central limit theorem, 2
,, 1 - 2
X is approximately normal, so (/ 18 a =--:E(x-x) or a, 2 = -1 - ( Lx 2 -(:Ex)
--
)
S.d. "' -,fn = 1f05 = 1.8 n-1 n-1 n
- N (fl,-;;a') with a= 18 and n= 100.
X~
Ideally the population distribution should be normal, but if it is not, then the central limit
theorem can be applied, since the sample size is large.
If H 0 is true, then I'= 45,
When the mean fJf a population X \Vlth \'Mia nee o z, nrovidcd dH' scunprc sJZC
2
n 1s
so X~ N 45, 18 ) .
( 100 45
1.
4, ,)t:ltv tht level Use a two-tailed test, at the 5% level. n j
ur dw r,~,:· In standardi:-:.cd
The test is two-tailed since you are looking for a change in fl, not specifically an
increase or a decrease. the rest "'tar!stic Z
S. Uc:cidc- >-!II The critical z-values for a two-tailed test at the 5% level are ±1.96 (see page 649).
yn;;r rqcctlou Remember that the 5% is shared evenly between the tails.
Example 11.4
Reject H 0 if z < -1.96 or z > 1.96, i.e. if Iz I> 1.96, The packaging on an electric light bulb states that the average length of life of bulbs is
Ji: _,, Ji:-45 1000 hours. A consumer association thinks that this is an overestimate and tests a random
where z sample of 64 bulbs, recording the life x hours, of each bulb.
aj-{;z 18/'1100
The test value is Ji: = 48.4, The results are summarised as follows:
·-vqttit·c·d LX= 63 910.4, :Ex 2 = 63 824 061.
48.4-45
C-1L;tbtil)i1.
so z (a) Calculate the sample mean, Ji:.
18/--llOo 2.5%
= 1.888 ... (b) Calculate an unbiased estimate for the standard deviation of the length of life of all light
z, -1.96 0 )' 1.96 bulbs of this type.
Since Iz I< 1.96, do not reject H 0 . 1.88 ·· (c) Is there evidence, at the 10% significance level, that the statement on the packaging is
cnndusiu11. overestimating the length of life of this type of light bulb?
There is not sufficient evidence, at the 5% level of significance, to reject the
management's claim that the mean age is 45 years. Solution 11.4
LX 63 910.4
(a) Ji: = - 998.6 hours
n 64
1
(b) a2 = - -(:Ex 2 - (:Ex)')
n-1 n
2
=1- ( 63824061----
63 910.4 )
63 64
= 49.77 ...
a=--149.77 ...
= 7.055 (3 d.p.)
;-\iF---

(c) Perform the hypothesis test as follows: A Type I error is made if H 0 is rejected when H is true.
)_ DdJ;r,: the Let X be the lifetime, in hours, of a light bulb. This is written --~--~,"lw 'I'Y:ll'ton w iS 11 uc,duc<:d in C:h tptq 1 re; dc~crih,·
vari:1hlt· Let the population mean be I' and the population standard deviation be a. [error) i.s trw
H 0 : p ~ 1000 (The statement on the packaging is correct) H the lc,,..el i.s a(_>;_) then rhc
h :,o the siEniilicom• e
and !-! 1 H 1 : p < 1000 (The statement on the packaging is overestimating the length of lcvd of the test and dw of HJ;tKing I error arc both the sam_c.
life)
-~- .'\I :tt,- I: he· For samples of size n, where n is large, by the central limit theorem and using 8 A Type II error is made if H 0 is accepted when H is false.
disHibuTiuil of for a, This is written °
2 !l
~ ) withn~64and8~7.055.
lS ac!:eptcd
X-N0,
To calculate the probability of a Type II error, a particular value must be specified in the
2
alternatrve hypothesis H 1 .
If H 0 is true, p ~
-
1000, so X- N ( 1000, ~
7.055 ) . So II error) J;; IfUC)

Use a one-tailed test, at the 10% level.


cJ; dw; ·:;;_ POWER OF A TEST
The critical z-value for a one-tailed 10% test (lower tail) is -1.282. {see page 649).
The pon-'cr of a test P( ~vhcn j_,; true i
x-l'o
x-1000 =!
~·ri\t.'L'iUii.
Reject H 0 if z < -1.282, where z ~ --;-r ~ .
8/vn 7.055/v64
= U error\

Example 11.5
6. Pcdorm -h,-- From the sample, x ~ 998.6.
I'~'C[llii'Cd 998.6-1000 A random variable has a normal distribution with mean p and standard deviation 3.
.. z
The null hypothesis-'' ~ 20 is to be tested against the alternative hypothesis 1, > 20 using a
i'<iku):lt]O_!I. 10%
7.055/fM
~ -1.587 ... random sample of srze 25. It ts dectded that the null hypothesis will be rejected if the sample
z. )' -1.282 0 mean rs greater than 21.4.
test value= -1.587
(a) Calculate the probability of making a Type I error.
7. ;\-Lkv y .lli' Since z < -1.282, reject H 0 (the mean is 1000 hours) in favour of H 1 (the mean
C'OiJC! 11~1\l! i is less than 1000 hours). (b) Calculate the probability of making a Type II error, when in fact /1 ~ 21.

There is evidence, at the 10% level, that the statement on the packaging Solution 11.5
overestimates the length of life of this type of bulb.
(a) You are given that X- N(p, 3 2 )
and H 0 : p ~ 20
H 1 : I'> 20
~ 25.
2
TYPE I AND TYPE II ERRORS For samples of size 25, X- N(p, : ) with n

When you make your decision about whether or not to reject H 0 there are two types of error
that could be made. These were described in Chapter 10 (page 493) and are called Type I and If the null hypothesis is true, p ~ 20 and X - N(2o 2'.)
'25
Type II errors:
P(Type I error)~ P(H0 is rejected when H 0 is true)
- a Type I error is made when you wrongly reject a true hypothesis, ~ P(X > 21.4 when p ~ 20)
Distribution of X when Jt = 20.
- a Type II error is made when you wrongly accept a false hypothesis.
~ P(z > 21.4- 20)
These can be summarised in a table: 3/fu
~P(Z > 2.333)
Test decision
~ 1 - <P(2.333)
AcceptH0 Reject H 0 ~ 1-0.9902
1%

Actual H 0 is true .!(correct) Type I error ~ 0.0098


20 21.4
situation H 0 is false Type II error .f (correct) ""1% 0 2.333
The probability of making a Type I error ~ 1%.
T
NOTE: This gives the significance level of the test, so if values of the sample mean greater Filled bags are supposed to have a mass of
than 21.4 are rejected, the significance level of the test is 1%. 1506.5 g .. As.sun~ing that the mass of a bag has To test this claim the mean mass of a random
normal dtstnbutton with variance 0.16 g 2 , test s~mp~e of 50 components is calculated and a
(b) If, in fact, I'~ 21, the hypotheses become whether the sample provides significant evidence siglllflcance test at the 5% level carried out On
at the 5% level, that the machine produces ' the basis of the test, the claim is accepted. .
H 0: 1' ~ 20 overweight bags. (C) Between what values did the mean mass of the
50 components in the sample lie?
H 1 :fl~21
5. A variable with known variance 32 is thought to 10. For each of the following distributions,
P(Type II error)~ P(accept H 0 when H 0 is false) have a ~ean of 55. A random sample of 81 X a
observa~wns of the variable has a mean of 56.2. random sample of size n is taken and the ~alues
~ P(accept H 0 when H 1 is true) 2
of .Ex, .Ex or .E(x- X) 2 summarised as shown
~oe~ .thts provide evidence at the 10% level of
s1gmfrcance that the mean is not 55? Test ~he .hypotheses stated at the significance ·
You are given that H 0 is rejected if x > 21.4, so H 0 will be accepted if x < 21.4. levelmd1cated.
Explai~ what part the central limit theorem has
If H 1 is true, then I'~ 21 and X- N( 21, ~; )· played m your calculations.
n J:x Ex'
Level of
:E(x -x) 2 Hypotheses significance
6. A random sample of 75 eleven-year-olds
Distribution of Xwhen 11 = 21. performed a simple task and the time taken (") 65 6500 650 842.4
so P(Type II error)~ P(X < 21.4 when !l ~ 21) H0 : p=99.2, 5%
t minutes, noted for each. The results were '
21.4-21) H 1: !1*99.2
summarised as follows:
~P Z<-~- (b) 65 6500 650 842.4
H0: !l = 99.2,
( 5%
3j'{lS It= 1215, It'=21 708. HI: !l > 99.2
~ P(Z < 0.667) 75% (c) 80 6824
Test, at the 1% level, whether there is evidence 2508.8 H0 : p=86.2, 10%
~ <f>(0.667) that the mean time taken to perform the task is HI: fL< 86.2
~ 0.7477 K 21 21.4
greater than 15 minutes. (d) 100 685 4728.25 H0 : p.=7, 1%
~75% z. 0 0.667
7. Cassette tapes manufactured by a particular firm f"I1 : 1-l* 7
The probability of making a Type II error is 75%. are such that the playing time of the tapes can be
mo~el~ed by a normal distribution with standard 11. A large random sample was taken from a
dev1atwn 1.8 minutes. The tapes are advertised population with mean Jt and known variance.
as having a playing time of 90 minutes but the The nul! hypothesis Jt =52 was tested against the
manufactu.rer ~!aims that they actuall/have a alternatrve hypothesis f.-1. *52 at the 4%
mean playmg tune of 92 minutes. An investigator significance level. The calculated value of the
selected 36 tapes at random and checked the standardised test statistic was 2.19.
ise 11 a L- a norrna I or1 or Ia sample size ~laying times. He calculated the mean playing
(a) C~rry out a significance test for p based on
la,Iband lc) ttme of the tapes in the sample and, on the basis
of the value obtained, he rejected the thts result, stating your conclusion dearly.
(b) State the probability of making a Type I
1. For each of the following, X follows a normal machine into the cans has decreased. He takes a manufactu~er's claim at the 5% level, saying that error.
distribution with unknown mean fl and known random sample of 50 cans and finds that the the mean ttme was less than 92 minutes.
standard deviation a. A random sample of size n mean volume of liquid in these cans is 334.6 ml. (a) What can be said about the value of the 12. A sample of size 15 is taken from the distribution
is taken "from the population of X and the sample Does this confirm his suspicion? Perform a sample mean for this decision to be taken? of X where X~ N(Ji, 4). If the sample mean is
mean, X, is calculated. significance test at the 5% level and assume that
The mean playing time of the firm's cassettes is gr~ater t~an 10. 72, the null hypothesis fl = 10 is
Test the hypotheses stated, at the significance the standard deviation remains unchanged. reJected m favour of .the alternative hypothesis
in fact 90.8 minutes.
level indicated. fl > 10.
3. In a significance (hypothesis) test of the mean of (b) Find the probability of making a Type II
a population, a null hypothesis Jt"" 103.5 is error. (a) Find the probability of making a Type I
Level of error.
n x a Hypotheses significance
tested against an alternative hypothesis (c) F!nd the probability that a cassette tape,
ft < 103.5, where ft is the mean of a normally prcked at random, has a playing time less (b) Find t.he probability of making a Type II
error tf fl = 10.5.
(a) 30 15.2 H 0: ft = 15.8, H 1: Jt * 15.8
3 5% distributed variable with known variance. than the stated value on the pack of
A sample is taken from the population and a 90 minutes.
(b) 10 27 1.2
H 0 : ft = 26.3, H 1: p>26.3 5% 13. An IQ test is developed such that the mean
standardised normal test statistic z"" -1.35 is
(c) 49 125 4.2 H 0 : Jl = 123.5, H 1: p> 123.5 1% calculated. 8. A sample of 40 observations from a normal q~otient is 100 and standard deviation is 12. It is
What conclusion, at the 5% level of significance, distribution X gave .Ex= 24, .Ex 2 = 596. gtven to a random sample of 50 children in one
(d) 100 4.35 0.18 H 0 : ft = 4.40, H 1: p. < 4.40 2%
can be reached about the mean of the Performing a two-tailed test at the 5% level, test area. The average mark was 105. Does this
population? whether the mean of the distribution is zero. provide.evidence, at the 5% level, that children
2. A machine fills cans with soft drinks so that their from thts area arc generally more intelligent?
contents have a nominal volume of 330 ml. Over 4. A machine packs flour into bags. A random 9. The masses of components produced at a
a period of time it has been established that the sample of 11 filled bags was taken and the p~rticular workshop are normally distributed
volume of liquid in the cans follows a normal masses of the bags, to the nearest 0.1 g were Wtth standard deviation 0.8 g. It is claimed that
distribution with mean 335 ml and standard the mean mass is 6 g.
deviation 3 ml. A setting on the machine is 1506.8, 1506.6, 1506.7, 1507.2, 1506.9,
altered, following which the operator suspects 1506.8, 1506.6, 1507.0, 1507.5, 1506.3, 1506.4
that the mean volume of liquid discharged by the
14. Boxes of a certain breakfast cereal have contents (b) Given that the actual value of pis 385, find
whose masses are normally distributed with the probability of making a Type II error.
(b) &2 ~ _1_ (l:x2- (l:x)2)

n
mean ,u g and standard deviation 15 g. A test of (c) Find the range of values of I' for which the n-1 n
the null hypothesis p"" 375 against the probability of making a Type II error is less

~~(11.5538- 7
alternative hypothesis p > 375 is carried out at than 0.025.
the 2!% significance level using a random (d) The test is carried out, independently, on
sample of 16 boxes. two different occasions. Find the probability
that at least one Type I error is made. (C) ~ 0.000 45
(a) Show that the alternative hypothesis is
accepted when X > 382.35, where X g is the so a ~ 0.0212 (3 s.f.)
sample mean mass.
(c) Ho: I"~ 1.50 (the wire is pure silver)
H,: fl > 1.50 (the wire is impure)
Test ld: Testing the mean p. when the population X is normal but the If Ho is true, fl = 1.50. Since n is small and a2 is unknown, the test statistic is T
variance a 2 is unknown and the sample size n is small -
X-1.50
'
h
were T and T-t(n-1),
In this case, the population is normal, so X . . . N{jt, a 2 ). Since a 2 is unknown, 8 2 is used instead &/-&z
(as in Test 1c on page 519).
I.e. T
x -1.50
and T- t(4).
Consider the distribution of the sample mean X. When the sample size is small, X does not 0.0212/'fS
follow a normal distribution. As you saw in Chapter 9 (page 462), the standardised statistic is Use a one-tailed test (upper tail) at the 5% level.
called T and it follows a !-distribution with (n - 1) degrees of freedom. ul
_';, 'IV U'l
When the mean of a nornwl From the tables on page 650 the p 0.75 0.90 0.95 0.975
critical value for t is found f:om
row v ~ 4, p ~ 0.95 giving 2.132. v~l 1.000 3.078 6.314 12.71
2 0.816 1.886 2.920 4.303
the test st<ldstic i.s T \Vherc T = X /f \! and 'f .. 1). Reject H 0 if the test value of t is
3 0.765 1.638 2.353 3.182
greater thau 2.132.
4 0.741 1.533 12.1321 2.776
When finding the critical I-values, !-distribution tables are needed and these are printed on
page 650. You may need to remind yourself how to use them by reading again the notes on From the sample, x ~ 1.52.
page 464. 1.52-1.50

-./ ')( l'z:~~


t ~ 2.109 "'""~\-1(41
0.0212/'fS
Example 11.6
Five readings of the resistance X, in ohms, of a piece of wire gave the following results:
Since t < 2.132, H 0 is not rejected.
1.51, 1.49, 1.54, 1.52, 1.54
These are summarised by Lx ~ 7.6, l:x 2 ~ 11.5538. 0 2.132
There is not enough evidence, at the 5% level to indicate that th · ·. ·
If the wire is pure, the resistance is 1.50 ohms. If the wire is impure, its resistance is higher ' e w1re IS Impure.
than 1.50 ohms. Assuming that the resistance can be modelled by a normal variable with
1nean fl., and standard deviation a, calculate
Example 11.7
(a) the sample mean, x,
A h" ·
(b) an unbiased estimate of a, mac me IS supposed to produce paper with a mean thickness of 0 05 mm E" ht d
measurements of the paper gave a mean of 0 047 rnm "th d .d d ... lg ran om
Is there evidence, at the 5% level of significance, that the wire is impure? A . h h h" . WI a stan ar evwtwn of 0 002
te~~~~:~: ~ ~t l~v:lt ~~:::; ~Lt~:fr~e;r~:~~~~a%~~= i~~~~einerenits £normally distrdibuted~m.
rom expecte .
Solution 11.6
·1, i)ct'i!il' l:!w Let X be the resistance, in Ohms, of the wire. Solution 11.7
v.ui:lillv. Let the population mean be /l, and the population standard deviation be a. I. Dcfi,,, ;'L
Then X- N(f<, a 2 ). tri:d1 ·'"
Let X be the thickness, in millimetres, of paper produced by the machine.
Let the populatwn mean be fl and the population standard d · t" b
l:x 7.6 Then X_ N(fl, a 2) . ev1a 10n e a.
(a) x~-~-~1.52ohms
n 5
T
Since 0 2 is unknown, the unbiased estimate 8 2 is used, where llll for a norrnal popu , srnall sample size
az = _n_ x 5 2 (where s 2 is the sample variance). 1d)
n-1
1. For each of the following, X follows a normal
8 6. The cholesterol levels of 8 women were
a' 2 ~ 0. 002 2 distribution with unknown mean p and known
7
x
standard deviation a. A random sample of size n measured, with the following results.
~4.57 ... x to-' is taken from the population of X and the
sample mean, X, is calculated.
3.1, 2.8, 1.5, 1.7, 2.4, 1.9, 3.3, 1.6
8 ~ 0.002 14 (3 s.f.)
Test the hypotheses stated, at the significance Making any necessary assumptions,
level indicated.
H 0 : ft ~ 0.05 (the thickness is as expected) (a) test, at the 5% level, whether the sample has
H 1 : fl * 0.05 (the thickness IS dtfferent from that expected) n x:x E(x -x) 2 Ex' Hypotheses Level
been drawn from a distribution with mean
cholesterollevcl3.1,
If H 0 is true, ft ~ 0.05. . . . (a) 12 298.8 (b) calculate a symmetric 95% confidence
7542.42 H 0 :p=24.1, 5% interval for the mean cholesterol level.
Since n is small and a 2 is unknown, the test statistic IS T where H 1 p>24.1

T
x- 0.05 and T- t(n -1)
(b) 17 605.2 23016.92 H 0 :p = 40, 5% 7. A marmalade manufacturer produces thousands
of jars of marmalade each week. The mass of
H 1 :peF40
8/'>/n (c) 6 9034.8 50.8 H 0 : ft""' 1503, 1.0%
marmalade in a jar is an observation from a
normal distribution having mean of 455 g and
X-0.05 HI: !l * 1503 standard deviation 0.8 g.
i.e. T and T- t(7). (d) 10 1298 97.6
0.00214/fS H 0 : p = 133.0, 1% Following a slight adjustment to the filling
H 1:p<133.0 machine, a random sample of 10 jars is found to
4. St-dlt: the: lv cl contain the following masses, in grams, of
Use a two-tailed test at the 1% level. marmalade:
uf tht' tc~t. 2. An athlete finds that her times for running a race
are normally distributed with mean 10.6 454.8, 453.8, 455.0, 454.4, 455.4,
S. Dccick ou The critical value for p 0.75 0.90 0.95 0.975 0.99 0.995 seconds. She trains intensively for a week and 454.4, 454.4, 455.0, 455.0, 453.6
t is found from row then records her time in the next 6 races. Her
v ~ 7, p ~ 0.995 v-1 1.000 3.078 6.314 12.71 31.82 63.66 times, in seconds, are {a} Assuming that the variance of the
en tenon.
2 0.816 1.886 2.920 4.303 6.965 9.925 distribution is unaltered by the adjustment,
(because you want 10.70, 10.65, 10.75, 10.80, 10.60 test at the 5% significance level the
0.5% in the each tail) hypothesis that there has been no change in
7 0.711 1.415 1.895 2.365 2.998 13.4991 Is there evidence, at the 5% level, that training the mean of the distribution.
giving ±3.499. intensively has improved her times? {b) Assuming that the variance of the
Reject H 0 if t < -3.499 or t >3.499 i.e. if It I> 3.499. 3. Family packs of bacon slices are sold in 1.5 kg distribution may have been altered, obtain
an unbiased estimate of the new variance
6, l\:rful'rl1 dw From the sample, x ~ 0.04 7. T- 1(7) packs. A sample of 12 packs was selected at and, using this estimate, test at the 5% level
random and their masses, measured in of significance the hypothesis that there has
!:cqctircd
kilograms, noted. The following results were
c:dcttL-Ition. t 0.047-0.05 ~ -3.96 ... obtained:
been no change in the mean of this
distribution. (C)
0.002 14/fS Ex~ 17.81, I:x 2 ~ 26.4357
0.5% 0.5% 8. Six observations of a continuous random
7. rvtakc: your Since It I> 3.499, H 0 is rejected. Assuming that the masses of packs follow a variable X gave the following values:
co1Khtsiuu. -3.499 0 3.499 normal distribution with variance a 2 , test at the
1% level whether the packs are underweight,
120.3, 122.4, 119.8, 121.0, 122.5, 119.6
2
There is evidence, at the 1% level, that the output from the machine is different from that (a} if a is unknown, {b) if a = 0.0003.
2 State any conditions that are necessary for the
valid use of a t-test to test a hypothesis about the
expected. 4. It is thought that a normal population has mean mean of X.
1.6. A random sample of 10 observations gives a Assuming that the use of the t-test is valid, test
mean of 1.49 and standard deviation of 0.3. the null hypothesis that the mean of X is 120
Does this provide evidence, at the 5% level, that against the alternative hypothesis that the mean
the population mean is less than 1.6? is not 120, using a 5% significance level.

5. A random sample of 8 observations of a normal 9. A random sample of 12 independent


variable gave observations of a normally distributed random
variable X is taken from a population and a test
statistic, t = 2.9, calculated. It is thought that the
Test, at the 5% level, the hypothesis that the population mean ,u is 27. Write down suitable
mean of the distribution is 4.3 against the null and alternative hypotheses to carry out a
alternative hypothesis that the mean is greater two-tailed significance test for p and use a t-test
than 4.3. to test your hypotheses at the 1% level.
10. A firm of solicitors claims that, on average,
interviews with clients last 50 minutes. A
random sample of 15 interviews is chosen, and
interview has a normal distribution, usc a t-test
to determine, at the 5% significance level,
whether the firm is overstating the average
T ·~. Su:,_· l"i1c·
di~rribu;iui
If H 0 is true, then p ~ 0.5, so X_ B( 1 00, 0 .5 ).
the time taken for each interview, x minutes, is
noted. The results are summarised by Ex= 746
interview time. Give null and alternative
hypotheses, full details of your procedure and a
;lh' \<.'Sl ':'l:lt;_slic
Now n (the number of tosses) is large and np ~ 100 x 0.5 ~50> 5, nq ~50> 5.
h:Uli'r!ill\~
and L% 2 = 37 180. Assuming that the time for an conclusion. (C) Since np > 5 and nq > 5' use the normal approximation,
X- N(np, npq) with np ~50 and npq ~ 100 x 0.5 x 0.5 ~ 25 ,
HYPOTHESIS TEST 2: TESTING A BINOMIAL PROPORTION p WHEN n i.e. X- N(50, 25)
IS LARGE s.d. ~ '-lnpq ~ ru ~ 5
4. St:Hc the ln·ci
Consider the situation when independent trials are carried out, each with a probability p of Use a one-tailed test at the 5% level.
l">l the ksr.
success, where p is constant. If X is the number of successes in n trials, then X follows a
5. Dcxidv on
binomial distribution i.e. X- B(n, p) (page 279). The critical z-value for the 5% critical region (upper tail) is 1.645.
In Chapter 10 (hypothesis tests for discrete variables, page 483) you learnt how to carry out a Reject Ho if z > 1.645, where z is the sample value when standardised,
hypothesis test for an unknown binomial proportion p. This involved calculating binomial (1. Pcri.Dm< the·
probabilities which are relatively easy to find when n is small. When standardising the sample value of
'(jLlircci
57 heads you have to use a continuity
When n is large, however, the calculations can become very cumbersome and in such cases it ':1k,,J:t,-i()il,
correction. Think of the discrete value of
is useful to use the normal approximation to the binomial distribution: 57 being represented by a rectangle over
If n is sufficiently large such that np > 5 and nq > 5, then the binomial distribution the continuous values from 56.5 to 57.5.
X-B(n,p) In order to reject H 0, the complete 50
can be approximated by a normal distribution rectangle must lie in the critical region.
X- N(np, npq), where q ~ 1 - p. Therefore take as the test value the lower
boundary, 5 6.5.
When performing the hypothesis test, you are able to work in standardised z-values. Since the
normal distribution is continuous and the binomial is discrete, you will need to use a So z~56.5-np
continuity correction (see page 383) and this involves amending your test value by adding or '-lnpq
subtracting 0.5. Further details are given in the following examples. The stages of the test are 56.5-50
the same as in the general procedure outlined on pages 513.
5
\'4/hcn the tml!J<rrtion /J 1 of a binomial ~1.3
the test statistic i;-. the nmnbcr of successes in n \vhere .X 0
50
T 1.645

such that njJ >5 and IUJ > S, X is normal and 56.5
1

npq). Since z < 1.645, the sample value of 57 heads is not in the critical region and H
U'liK!iiS!C.•Ii.
ts not reJected. o
in standardised
X- nfJ ~n the. statistical evidence, Caroline should have concluded that the coin is no
the test stMistic is Z ~.::.: where Z ,~ I'>. b1ased m favour of heads. t

It is intebrestidng to work out how many heads would need to be obtained to conclude that the
Example 11.8 x, np cmn Is Iase m favour of heads. This can be done as follows:
Caroline was asked to test whether a coin is biased in favour of heads, using a 5% level of The standardised test value lies in the critical region if z >] .645.
significance. She tossed the coin 100 times and obtained 57 heads. What should she have
concluded?
If thOe 5numhber of heads is x, then, applying the continuity correction, you need to consider
x - . w en standardtsmg the test value.

Solution 11.8 (x- 0.5)- np


Therefore .rc- > 1.645
vnpq
:1.. Ddi;:c :he Let X be the number of heads in 100 tosses and let the probability of obtaining
\'~Hi;1b!c. a head be p. Then X- B(100, p). (x- 0.5)- 50
I.e. 5 > 1.645
H 0: p ~ 0.5 (the coin is not biased and heads or tails are equally likely to occur)
'llld fl,' H 1: p > 0.5 (the coin is biased in favour of heads) X> 50+ 0.5 + 1.645 X 5
X> 58.725
I
I Example 11.10
Since xis an integer, the least value of xis 59. So if Caroline had obtained 59 or more heads
when she tossed the coin 100 times, she would have concluded that the coin was biased in
The random variable X can be modelled b b' . . . . .
favour of heads. and P whose value . I A . . . y a momwl dtstnbutwn wllh parameters n ~ 200
' IS un <nown stgmfic t . f d
NOTE: This result is perhaps surprising. Would you have thought that more heads would be to test the null hypothesis P- 0.4 . hancel est ls per orme , based on a sample value x,
. - · agamst t e a ternative hypoth · p < 0 4 Th b b'l'
needed? rna 1ong a Type I error when perf . h' . esJs · · e pro a lily of
ormmg t 1s test 1s 0.05.
(a) Find the critical region for x.
Example 11.9 (b) Find the probability of making a Type II error in the case when P ~ 0.3.
A manufacturer claims that a particular brand of seeds has a 90o/o germination rate. To test
this claim, 150 randomly selected seeds are planted and it is noted that 124 germinate. Does Solution 11.10
this provide evidence, at the 1% significance level, that the manufacturer is overstating the
germination rate of the seeds? (a) You are given that X- B(200, p).
The hypotheses are H 0 : p ~ 0.4
Solution 11.9 H 1 :p<0.4
1. Ddine the Let X be the number of seeds that germinate in 150 and let the probability that If Ho is true, then p ~ 0.4, so X- B(200, 0.4).
\'ari:1bk. a seed germinates be p. Then X- B(150, p).
Now np ~ 200 x 0.4 = 80 and nq = 200 x 0.8 = 160.
H 0: p ~ 0.9 (the germination rate is 90%) Smce nfJ > 5 and nq > 5, use the normal approximation,
H 1: p < 0.9 (the germination rate is less than 90% and the manufacturer is
overstating the rate) X- N(np, npq) with np ~ 80, npq ~ 200 x 0.4 x 0.6 ~ 48
so X- N(80, 48). '
-). Sr:11c' dw If H 0 is true, then p ~ 0.9, so X- B(150, 0.9).
Since n is large, check whether the normal approximation can be used. You are given that P(Type I error)~ 0.05. !i
t!HC l<_'~t :(:\TlS!K
Now np ~ 150 x 0.9 ~ 135 > 5 and nq ~ 150 x 0.1 ~ 15 > 5.
;1(_'U_!Hii'li-~ I\'> f{r:· SincefP(Typel I error) ~ P(Ho is rejected when Ho is true) this is the same as the
Since np > 5 and nq > 5, use the normal approximation, s1gm tcance evel of the test. '
X- N(np, npq) with npq ~ 150 x 0.9 x 0.1 ~ 13.5 So the significance level of the test is 5%.
1.e. X- N(135, 13.5) Using a one-tailed test, at the 5% level h ·· 1 ··
tail) is -1 645 S . H 'f h , t e cntrca z-value for the crllJcal region (lower
Use a one-tailed test at the 1% level. 0
· · reJect o 1 l e sample value, when standardised, is less than -1.645.
The critical z-value for the critical region (lower tail) is -2.326. To find the critical region for x, remember that a
contt.nut~y correction is needed. Since you are
So H 0 is rqected tf your test value, when standardtsed, constdermg values in the lower tail and you want
Cl"ltCcJ_-lOll.
is less than -2.326. The test value is 124, but when
to mclude the complete rectangle representing x
you consider the continuity correction in this use x + 0.5. '
lower tail test you want to see whether the
complete rectangle for 124 lies in the critical X+ 0.5-80
region, so you need to standardise 124.5. ,_.,_.
NN
w<>
I

148
<-1.64s '"'-
-1.645 X+ 0.5

Reject H 0 if the standardised value of 124.5 is less i.n i.n ~ use this as

than -2.326.
the test value
" X< 80- 0.5- 1.645 X m
x<68.10 ...
124.5- np
l't-(pti n~cl
z Since xis an integer, the critical region is x ~ 68.
'-lnpq
v·dciiLuiun. ~~---check:
124.5-135
'-Ins z, ' w~ 8 ,z= 68.5-80
~-~.. ~ m _ _ . . ..
~ -2.857 ... x. l-2.326 1~5 1.659 < 1.645, SO 68 lS 111 the Crltlcal region.
124.5
69.5-80
Since z < -2.326, the sample value is in the When x ~ 69, z ~ >{48 = -1
' 515 > -1 ·645 ' SO 69 IS· not 1ll
· the cntlcal
.. region.
critical region and so H 0 (the germination rate is 90%) is rejected in favour of
H 1 (the germination rate is less than 90% ).
There is evidence that the manufacturer is overstating the germination rate.
(b) If p = 0.3, the hypotheses become 7. In an investigation into the ownership of mobile
In Manuel's restaurant, of a random sample of
phones amongst school children, 200 randomly
H 0 : p = 0.4 chosen school children were interviewed and 1.42
100 people ordering meals, 31 ordered
H 1:p=0.3 vegetarian meals.
owned a mobile phone. Test, at the 5% level of
significance, the hypothesis that 65% of school (b) Set up null and alternative hypotheses and,
From part( a), the critical region is X,;; 68, children own a mobile phone against the using a suitable approximation, test whether
so H 0 is accepted when X> 68. alternative hypothesis that more than 65% own or not the proportion of people eating
a mobile phone. vegetarian meals at Manuel's restaurant is
P(Type II error) = P(H0 is accepted when H 1 is true) different from that at Enrico's restaurant.
8. (a) A gardener sows 150 Special cabbage seeds
=P(X> 68 when p = 0.3) and knows that the germination rate is
Use a 5% level of significance. (L)

When p = 0.3, np = 200 x 0.3 = 60 and npq = 200 x 0.3 x 0.7 = 42 75%. By using a suitable approximation 12. When a drawing pin is dropped on to the floor,
find the probability that: the probability that it lands point up is p.
Therefore X- N(60, 42). (i) more than 122 seeds germinate
(ii) fewer than 106 seeds germinate (a) A teacher drops a drawing pin 900 times
Note that the conditions np > 5, nq > 5 are satisfied so the normal approximation can be and observes that it lands point up 315
(b) The gardener also sows 120 Everyday times. Test, at the 1% level, the hypothesis
applied. cabbage seeds and finds that 81 germinate. that p = 0.4 against the alternative
Distribution given by H1 Test whether the Everyday seeds have a hypothesis p < 0.4.
Now P(X > 68)---; P(X > 68.5) (continuity correction), X- N (60, 421 germination rate less than 75%. Perform a
•.., significance test at the 4% level.
(b) A student drops a drawing pin 600 times

/
and observes that it lands point up 251
P(tpe II error) =·P(X > 68.5 when X- N(60, 42))
\ 9. A government report states that a third of
times. Using the student's results, find a

~· l"~.
68.5- 60) symmetric 95% confidence interval for p.
=P ( Z >
~42
·= teenagers in Great Britain belong to a youth
organisation. A survey, conducted among a As part of a statistics investigation, 1500
random sample of 1000 teenagers from a certain students carry out similar experiments and they
= P(Z > 1.312) each calculate (correctly) their own symmetric
city revealed that 370 belonged to a youth
= 1-0.6224 x.
- I
60 68.5 organisation. Does this provide evidence, at the 95% confidence interval for p. Find the expected
~38% z, 0 1.312
2% level, that the proportion of teenagers in this number of these intervals that do not contain the
city who belong to a youth organisation is true value of p. (C)
greater than the national average?
13. After carrying out a survey, a market research
11c a binomial Ia n 10. A questionnaire was sent to a large number of
people, asking for their opinions about a
company asserted that 7 5% of TV viewers
watched a certain programme. Another company
proposal to alter an examination syllabus. Of the interviewed 75 viewers and found that 51 had
1. In the following, X~ B(n, p) with n as shown .. 3. In a survey it was found that 3 out of every 10
180 replies received, 134 were in favour of the watched the programme and 24 had not. Does
p is unknown and x is the number of successes m people supported a particular political party. A
proposal. -s-tating necessary assumption, this provide evidence, at the 5% significance
the sample. month later a party representative claimed that
level, that the first company's figure of 75% was
Test the hypotheses stated at the level of the popularity of the party had increased. Would (a) test, at the 5% !eve, hypothesis that the incorrect?
significance indicated. you accept that the numbe7 who supported the population proportion in our of the
party was still 3 out of 10 tf a further survey proposal is 0. 7 against the alte tive that it 14. The Paper Engineering Company has
.n Hypotheses Level revealed that in a random sample of 100 people, is more than 0. 7,
X tr<iditionally supplied 85% of the retail outlets
38 supported the party? Test at the 3% level. (b) find a symmetric 95% confidence interval for origami products. With the onset of increased
(a) so 45 H 0 : p=O.S, 5% for the population proportion in favour of competition they feared that this proportion
4. A large college claims that it admits equal the proposal. (C)
H 1 :p>0.8 might have-fallen. They examined a random
numbers of men and women. In a random
(b) 60 42 H 0 : p=O.SS, 2% sample of 500 retail outlets and found that 405
sample of 500 students at the college there were 11. Over a long period of time it has been found that of them sold Paper Engineering Company
H 1: P*O.SS 267 males. Is this evidence, at the 5% level, that in Enrico's restaurant the ratio of non-vegetarian products. Usc a normal approximation to the
(c) 120 21 H 0 : p=0.25, 5% the college population is not evenly divided to vegetarian meals ordered is 3 to 1. binomial distribution to carry out a hypothesis
between males and females? During one particular day at Enrico's restaurant,
H 1: p * 0.25 test at the 1% significance level to test whether
a random sample of 20 people contained two or not their proportion of the retail outlets has
(d) 300 213 H 0 : p=0.65 1% 5. A theory predicts that the probability of an eve?t who ordered a vegetarian meal. fallen. Give suitable null and alternative
H 1: P*0.65 is 0.4. The theory is tested experimentally and m
(a) Carry out a significance test to determine hypotheses and state your conclusion clearly. (C)
90 56 H 0 : p = 0.76, 1% 400 independent trials, the event occurred 140
(e) whether or not the proportion of vegetarian
times. Is the number of occurrences significantly
H 1: p<0.76 less than that predicted by the theory? Test at the meals ordered that day is lower than usual.
State clearly your hypotheses and use a 10%
1% level.
significance level. Use an exact binomial
2. A manufacturer claims that 8 out 10 dogs prefer test.
its brand of dog food to any other. In a random 6. It is thought that the proportion of d~fec_tive
items produced by a particular machme IS 0.1.
sample of 120 dogs, it was found that 85
appeared to prefer that brand. Test, at the 5 0Yo A random sample of 100 items is inspected ~nd
found to contain 15 defective items. Does th1s
level, whether you would accept the
provide evidence, at the 5% level,_tha_t the
manufacturer's claim.
machine is producing more defective Items tha 0
expected?
HYPOTHESIS TEST 3: TESTING ,u 1 - ,u 2 , THE DIFFERENCE BETWEEN Test 3b: The populations have a common variance, CJ2' which is known
MEANS OF TWO NORMAL POPULATIONS
ff there is a common population
This test is used when you have two normal populations X 1 and X 2 with unknown means, ft 1 the test statistic is 5( 1
and p 2 , and you want to test the difference between the means of these populations. Consider
X 1 - N(p 1,a 12) and X 2 - N(p 2,a,Z).
The hypotheses might be:
Ha: 1'1 _,,, ~ ..•
H 1 : p 1 -11 2 > ··· (or 11 1 -11 2 <···or 11 1 -11 2 * ···)
Often the test involves the null hypothesis that the means are the same, i.e. p 1 ~ p 2 or Note that the 95% confidence limits for 1, 1 _ 1, 2 are (x 1 -x) + 1 96 J F } l
z -.. a -+-.
11 -11 ~ 0, so the null hypothesis would be H 0 : 11 1 -11 2 ~ 0. n1 nz
1 2

To test the difference between the means, take a random sample of size n 1 from X 1 and work
out its sample mean, .X 1 • Also take a random sample of size n 2 from X 2 and work out its Teh~t h3~: The populations have a common population variance CJ2
sample mean X1 . w 1c IS unknown • •
The test statistic is X1 - X2, and you need to consider the sampling distribution of the
If the common population variance 2 · J h ·
difference between means. The mean and variance of this distribution can be found as follows: instead. This is sometimes known a~: P' ~soulndctnown, t eln an ~nbtased estimate, az' is used
e wo-samp e estimate where
2 2 ,
Az n1s1 +nzsz
a = nl + nz- 2 (stz and Szz are the sample variances)

An alternative format for 8 2 is


, ~ L(x 1 - x1) 2 +;;(x, _x,)'
8
The distribution of X 1 - X 2 depends on various factors and careful analysis of a given n 1 /n 2 -2
situation is required in order to decide which test to use. In each situation described below, The distrib tion of X 1 _X 2 dep en d s on th e w h eth er the samples taken are large or smalL
the underlying distributions are normal.
Note that, for reference, the 95o/o confidence intervals for p 1 - flz are also given. Large s mples

normal.
2
Test 3a: The population variances CJ and CJ} are known The tcsr statistic is X 1 }( ;~ vv· here X1
1
In, 1
H the '/ar!ances u 12 and arc

the test statistic is X r - \Vhcrc _X J ,111 Jf the test stati<otk 7 1-vhcrt:'


J
Hl n, IL
1n standardised fonn,

r·z are X1 -x- 2-


Note that the 95% confidence limits for f'1-" + 1. 96'a J-+-.
F}l
n1 nz
Jl I fl J
Small samples
FoT srn;:dl sarnnlr' the otcmrtarrllsrrJ form o:f the dirrtrihnr
Note that the 95% confidence lin1its for 11 1 _,,,are (x 1 - x 2) ± 1.96

I
f;
n, fl.,

Note that the 9 5% confidence limits for I' 1 _,, are (x -X ) + t,


z 1 2 - a
/11 h ere t 1s. such
-{;;:+;;:w
that P(T < t) ~ 0.975 for t(n 1 + n 2 _ 2). n1 n,
537

Example 11.11 Example 11.12

· the envt'ronment ' the masses of a certain


D ueto d 1'fferencesrn . species ofh small animal
· b are
th The same physical fitness test was given to a group of 100 scouts and to a group of 144
· d b ·
beheve to e greater m egwnR · A than in Region B. It ts known. that t e
d dd · ·o f
masses m guides. The maximum score was 30. The guides obtained a mean score of 26.81 and the
·
regtons are norma 11Y d'strt'buted
I '
with
.
masses in Region
. .
A havmg
f a stan
k ar evtatwn o scouts obtained a mean score of 27.53. Assuming that the fitness scores are nonnally
0.04 kg and masses in region B havmg a standard devtatwn o 0.09 g. distributed with a common population standard deviation of 3.48, test at the 5% level of
significance whether the guides did not do as well as the scouts in the fitness test.
To test the theory, random samples are taken: 60 animals from Region A had a mean mass of
3.03 kg and 50 animals from Region B had a mean mass of 3.00 kg.
Solution 11.12
Does this provide evidence, at the 1% level that the animals of this species in Region A have a
greater mass than those in Region B? Let X 1 be a guide's score and let the population mean be Ill'
Then X 1 - N(p 1, a 2 ) with cr ~ 3.48.

Solution 11.11 Let X 2 be a scout's score and let the population mean be Jt 2 •
Then X 2 - N(fl with cr ~ 3.48.
I. Ddi:tc the Let X 1 be the mass, in kilograms, of an anima; in Region A and let the
population mean bef't· Then X 1 - N(f<,, 0.04 ). . H 0 : fl 1 - fl 0 (there is no difference in the performance)
Let X 2 be the mass, in kilograms, of an amma; m Regton B and let the H 1: llt -p 2 < 0 (the guides did not perform as well as the scouts)
population mean be 11 2 • Then X 2 - N(/lz, 0.09 ).
Co tder the distribution of the difference between the means X1 - X2 •
Ha: 111 _ 112 ~ 0 (there is no difference in the masses between the regions)
tnd II,. .
H J•llt , > 0 (the animals in Regwn A have greater mass)
_ r2 _ - - 1 -X
X -2 - 1 n,
N (fl 1 -f< 2,cr'(;;;-+ . n 1 ~ 144, n 2 ~ 100
1 )) wtth
), St:ll c I iH· Consider tbe distribution of the difference between the means, X 1- X,.
distt·iin:ti\Yl ol
If H 0 is true then fl 1 - flz ~ 0,
X -X - f ~ 60, n 2 ~50
N(0,3.48 2 ( 1 ~ 4 + 1 ~ 0 ))
N(f' -112, a +a,') with n 1
n2
''CJ.'Oi'dinCJ, IU /-{,
1 2 1 nl
so X1 -X2 -

If H 0 is true then llt -!1 2 ~ 0,


Use a one-tailed test (lower tail) at the 5% level.
so X1-X,-NO,
0.09')
0.04
60+50.
2

The critical z-value is -1.645, so reject H 0 if z < -1.645, where z


(
Use a one-tailed test (upper tail) at the 1% level.
The sample values are x 1 ~ 26.81, x2 ~ 27.53.
The critical z-value is 2.326, so reject H 0 if z > 2.326
\'Olll' l'l~j ~U lUll ',,;,__,1 .x1-.x2-0
xt-Xz z
erik nun. where z ~ -r=="~~=ec
2 2 3.48 /_1_+_1_
0.04 0.09
--+-- ~ 144 100

fk_
n:qu;n,·c 60 50 26.81-27.53
caiccd:tliun 3.03-3.00
0.452 ...
0.0137 ... ~ -1.589 -1.645 0
~2.184 ...
I )( Since z > -1.645, do not reject H 0 •
7. i'vhkc: y Hit Since z < 2.326, do not reject H 0 • z, 0 2.326
cundctstc•·-- There is no evidence, at the 5% level, that the guides did not perform as well as
There is no evidence, at the 1% level, tbat the animals in region A have a greater the scouts in the fitness tests.
mass than those in region B.

Example 11.13
An investigation was carried out to assess the effects of adding certain vitamins to the diet. A
group of two-week old rats was given a vitamin supplement in their diet for a period of one
month, after which time their masses were noted. A control group of rats of the same age was
fed on an ordinary diet and their 1nasses were also noted after one month.
The results are summarised in the table: /'"'' Since z > 1.645, reject H 0 •
Number in Standard There is evidence, at the 5% level, that the rats given the vitamin supplement
sample Mean deviation -·~-~.~haz:ve
a great~er mass than the rats not given the supplement.
With vitamin supplement 64 89 ·6 g 12.96 g
Without vitamin supplement 36 83 ·5 g 11.41 g Example 11.14

.
~~~~--~~~~=·
les as lar e samples from normal distributions with the san:e v~nance, a '
2
Two statisti~s t chers, Mr Chalk and Mr Talk, argue about their abilities at golf. Mr Chalk
~:~a:~~~:~e~a;:,;,l whethe~ the results provide evidence that rats given t~e vltamm claims that wi a number 7 iron he can hit the ball, on average, at least 10m further than Mr
supplement have a greater mass, at age six weeks, than those not gtven t e Vltamm Talk. They c nducted an experiment, measuring the distances for several shots.
supplement. Denoting th distance Mr Chalk hits the ball by x metres, the following results were obtained:
n 1 ~ 40, Lx ~ 4080, L(x- x) 2 ~ 1132.

Solution 11.13 Denoting the distance Mr Talk hits the ball by y metres, the following results were obtained:
n 2 ~ 35, Ly ~ 3325, L(y- y) 2 ~ 1197.
Let X be the mass of a rat given a vitamin supplement and let the population
mean be p,. Then X 1 - N(p,, az) with a unknown. Assuming that the populations have a common variance, test whether there is evidence, at the
1% level, to support Mr Chalk's claim.
Let X2 be the mass of a rat in the control group and let the population mean be 1'2·
Then x 2 _ N(112 , a 2 ) with a unknown Solution 11.14
Since the common population vanance . . un lcnown, use 8 2 where
a 2 lS
Let X be the distance, in metres, for Mr Chalk and let the population mean be,u •
n 1 s 12 +n 2 s 22 Then X- N(/t 1, a 2 ) with a unknown. 1

n 1 + n2 - 2 Let Y be the distance, in metres, for Mr Talk and let the population mean be p •
64 X 12.96 2 + 36 X 11.41 2 Then Y- N(,u 2 , a 2 ) with a unknown. 2

64+36-2 An unbiased estimate for a 2 is 8 2 where


~ 157.5 ...
L(x- x) 2 + L(y- y) 2
a~ 12.55 ...
n 1 +n 2 -2
H 0 :I' 1 -1'2 ~ 0 (there is no difference in the masses of the two groups} 1132 + 1197
If H t·flt
• _ , 1 > 0 (the rats given vitamin supplements are heavier}
r-2 - - 40+35 -2
Consider the distribution of the difference between the means, X,- Xz. ~ 31.904 .. .
a~ 5.6483 .. .
x, _x2_N('' _p 0 2(:, +:,))with a~ 12.55, n 1~ 64, ~ 36
2, n2
The unbiased estimate of the common population standard deviation is 5.648
(3 d.p.}.

::Ho~: :;2t:e:(~,~:.25::( 3~ 614 + )).


H 0: 11 1 - ,u 2 ~ 10 (Mr Chalk hits the ball 10m further than Mr Talk}
Mr Chalk claims that he can hit the ball at least 10 m further than Mr Talk. Mr
4 . .St<ttc the icn.J Talk wants to refute this, so take as alternative hypothesis that Mr Chalk hits
Use a one-tailed test (upper tail} at the 5% level. the ball less than 10m further than Mr Talk.
nl rhc res:.
The critical z-value is 1.645, so reject H 0 if z > 1.645
H 1 :,u 1 -11 2 < 10 (Mr Chalk hits the ball less than 10m further than Mr Talk}
yoctr· 1 ·jcn iun X 1 -X 2 -0
where z Consider the distribution of the difference between the means X- Y.
{>. I'niur;n dw
i'C(jlLirccl
&~ X- Y- N(,u 1 -112,&
2
(:
1
+:,))with a~ 5.648, n ~ 40, n ~ 35
1 2
cc;ll·uht·iorl.
89.6- 83.5
If H 0 is true then I' 1 - ,u 2 ~ 10,
(11 5%
12.55 ~64+36-
~ 2.332 ... z, 0 1.645
so - - (
X- Y- N 10,5.648 2(401 + 351 )) .
Use a one-tailed test (lower tail) at the 1% level. Solution 11.15

The critical z-value is -2.326, so reject H 0 if z < -2.326. (a)


Using a calcul or in standard deviation mode, the following values b ·
them usmg our calculator. were 0 tamed. Check
!ll' 0()'1

Mean Variance
Type A boiler 63.83 104.32
Type B boiler 52.89 72.07
n, [ I l ! j I~ l ! ! 1(

Ly 3325 . hi(~
(b) (i) Let X, be the mass of dust deposited in a type A boiler and let the
y~-~--~95 populatiOn mean be ft 1 and population variance be a 2 •
n2 35
Then X 1 - N(jr 1, a 2 ) with a unknown.
102-95-10
So
z~5.648~
-2.326 0 Let X 2 be the mass of dust deposited in a type B boiler and let the
populatiOn mean be f.-l 2 and population variance be a 2 •

~ -2.29 ... Then X 2 - N(jr 2 , cr 2 ) with a unknown.


2
!v]c;k" yr lil Since z > -2.326, do not reject H 0 • Since a is unknown, use B2 where
UJ: c!l>i"'l.
There is not sufficient evidence, at the 1% level, to reject Mr Chalk's claim that az nl 512 + nzsz-'
he hits the ball, on average, at least 10m further than Mr Talk. n1 +n 2 -2
J3 X 104.32 + 9 X 72.07
13+9-2
~ 100.23 ...
Example 11.15
a~ 10.01 (2 d.p.)
An investigation was conducted into the dust content in the flue gases of two types of solid
fuel boilers. Ho: ! 1 1 -1<2 ~ 0 (there is no difference in the masses deposited)
Thirteen boilers of type A and 9 boilers of type B were used under identical fuelling and
*
H,. 1"1 -1'2 0 (there Is a difference in the masses deposited)
extraction conditions. Over a sin1ilar period, the following quantities, in grams, of dust were Consider the distribution of the difference between the means
'
x _x-
1 2•
deposited in similar traps inserted in each of the 22 flues. Since the samples are small and the common population variance is
Dust deposit (g) in Type A boilers: unknown, the test statistic is Twhere T- t(n + n _ 2) and
1 2
T~ X,-Xz-V•,-1,2)
73.1, 56.4, 82.1, 67.2, 78.7, 75.1, 48.0, 53.3, 55.5, 61.5, 60.6, 55.2, 63.1
Dust deposit (g) in Type B boilers: ,!Ff1
a -+-
n1 nz
with a~ 10.01, n, =13, n2 ~ 9
53.0, 39.3, 55.8, 58.8, 41.2, 66.6, 46.0, 56.4, 58.9
If H 0 true then 1< 1 - l"z ~ 0,
(a) Find the mean and variance of each of the samples.
so T
x,-x2 -o and T- t(20)
(b) Assuming that these independent samples came from normal populations with the same
var1ances:
(i) use a two-sample t-test at the 5% level of significance to determine whether there is
10.01~
any difference between the two samples as regards the mean dust deposit, ·;f:·,.,. '.-.oo, Use a two-tailed test at the 5% level.
(ii) test at the 5% level of significance whether there is any difference between the two
samples as regards the mean dust deposit where this time you should also assume that Because you want 2.5% in Critical values fort (see page 650)
the population variances are both known to be 196.0. the each tail, the critical value p 0.75 0.90 0.95 0.975
for t is found from row
(c) Explain the apparent contradiction in your results. (AEB) v ~ 20, p ~ 0.975 giving v~1 1.000 3.078 6.314 12.71
± 2.086 2 0.816 1.886 2.920 4.303
Reject H 0 if t < -2.086 or 19 0.688 1.328 1.729 2.093
t > 2.086, i.e. if 1 t 1> 2.086. 20 0.687 1.325 1.725 12.0861
6. Fcri(ln11 th,, From the samples, x 1 ~ 63.83, x2 ~ 52.89
rcquii·cd 63.83-52.89
lld r·j
nmans of two normal
t '
calntbri'ln.

10.01~ 2.5% 2.5% Section II.


~ 2.52 ...
1. I? each_ of the following, a random sample of size n 1 is taken from population X and a random sample of
7. :'v'lak(.: yom since t > 2.086, reject H 0. '' -2.086 ° 2 ·086 stze n 2 ts taken from population Y.
conclusion. Use the information given to test the hypotheses stated at the level of significance indicated.
There is a difference between the samples with regard to the mean dust {a) X~ N(tt 1, ai), X- N(Jt 2, ai}
deposit.
n, :Ex u,Z n, !:y u,' Hypotheses
(b) (ii) This time the population variances are known to be 196 and so a z-test is performed, Level
(i) 100 4250 30 80 3544
rather than a t-test. 35 Ho: f.1.1 - ftz = 0 5%
Hr:ftr-P.z*O
X 1 - N(rt 1, a 2
) with a~ .Vt96 ~ 14 (ii) 20 95 2.3 25 135 2.5
x,- N(rt 2 , a 2 ) with a~ 14 Ho:flt =p.z
Ht:ftt <{lz
2%

The hypotheses are as before, but the test statistic, the difference between the means, (iii) 50 1545 6.5 50 1480 7.1 Ho:flt =p.z 1%
is distributed Ht: flt >flz

X 1 -X2 - N(rt 1 -rt 2 ,a


2
(:, +:,))with a~ 14, n ~ 13, n, ~9.
1
Common population
nr !:x n, !:y standard deviation {a)
i.e. - - ( '(1:3+9
1 1))
X 1 -X2 -N0,14 (i) 50 2480 40 1908 4.5
Hypotheses

Ho: flt = P.z


Level
2%
(ii) Ht:ftt *flz
A two·tailed test at the 5% level gives critical z·values of ±1.96 (see page 649). 100 12 730 100 12 410 10.9 Ho:flt =ttz 5%
So reject H 0 if z <-1.96 or z > 1.96, i.e. if Iz I< 1.96
(iii) H1: flt > flz
30 192 45 315
x 1 -x 2 - o 1.25 Ho:flt =pz 1%-
where z Ht:!f.t <flz
(J
IFf
1
63.83- 52.89
2
(iv) 200

n,
18 470

!:x
300 27 663 0.86 Ho:flt =flz
Ht: ftt *Pz
10%

z !:(x-x)' n, !:y !:(y- y)' Hypotheses

~
Level
2.5% 2.5% (v) 40
14 2128 810 50 2580 772
9 Ho:flt =flz 5%
~ 1.802 ... -1.96 1.96 Ht:fl1 >pz
" (vi) 80 6824 2508 100 8740 3969 Ho: flt -.ftz 2%
since Iz I< 1.96, H 0 is not rejected. Ht:!f.t *P.z
(vii) 65 5369 8886 80
There is no difference between the samples with regard to the mean dust deposit. 4672 5026 Ho: !l 1 - {t 2 = 20 1%
Ht: flt- flz > 20
(c) Considering the variances of the samples, it would seem that the common population 2. A large group of sunflowers is growing in the
· f 196 0 given in part (b) is suspect. The value of JUSt over 100 gJVen by the shady side of a garden. A random sample of 36 3. The lengths, in millimetres, of 9 screws selected
vanance o · I . n 1 b urate of these sunflowers is measured. The sample at random from a large consignment are found
unbiased estimate appears more reasonable and so resu t (a) 1s Ee y to e more ace · to be:
mean height is found to be 2.86 m, and the
sample standard deviation is found to be 0.60 m. 8.00, 8.02, 8.03, 7.99, 8.00,
A second group of sunflowers is growing in the 8.01, 8.01, 7.99, 8.D1.
sunny side of the garden. A random sample of 26
From a second large consignment, 16 screws are
of these sunflowers is measured. The sample selected at random and their mean length is
mean height is found to be 3.29 m and the found to be 7.992 mm.
sample standard deviation is found to be 0.9 m. Assuming that both samples are from normal
Treating the samples as large samples from populations with variance 0.0001, test, at the
normal distributions having the same variance 5% significance level, the hypothesis that the
but possibly different means, obtain a pooled second population has the same mean as the first
estimate of the variance and test whether the population, against the alternative hypothesis
results provide significant evidence (at the 5% that the second population has a smaller mean
level) that the sunny-side sunflowers grow taller, that the first population. (C)
on average, than the shady-side sunflowers. (C)
544 /i, C ~;~~CJSE~ COUF.Sf~ i\ .6,-
T
I
!-!'lPCI"i--if:_S!S t~s-r!>lc: rt:STS,G..i\Di -ii:.STS.: 545

4. Hischi and Taschi are two makes of video tapes. 7. A random sample of size J 00 is taken from a A rand~m sample of 48 individuals from the
They are both advertised as having a recording ai
normal population with variance = 40. The populatiOn of young men aged 18 and of b~en. dra:'rn at random from independent normal
time of 3 hours. A sample of 49 Hischi tapes sample mean .X 1 is 38.3. Another random sample, moderat.e intelligence have foot lengths d1stnbut10ns having a common variance. Obtain
was tested and denoting the actual recording time of size 80, is taken from a normal population summansed b~ X= 26.6, I:(x -.X) 2 = 123.20. A an unbiased two-sample estimate of this common
by h minutes, the following results were obtained: with variance a~= 30. The sample mean .X 2 is c?m~lex g.enettc theory suggests that persons of varian~e. Tre~ting the samples as large samples,
40.1. Test, at the 5% level, whether there is a test thts genetic theory, using a significance test
Lh ~ 8673, L(h- h) 2 ~ 12 720 htgh mtelhgence have a greater foot length than
at the 1% significance level and stating clearly
significant difference in the population means p 1 do those of moderate intelligence. The two
A sample of 81 Taschi tapes was also tested. and p 2 • the hypotheses under comparison. (C)
samples described above may be assumed to have
Denoting the actual recording time by t minutes,
the results obtained were: 8. A certain political group maintains that girls
Lt~ 14 904, L(t- 1) 2 ~ 33 488 reach a higher standard in single-sex classes than
If the recording times for the two makes are in mixed classes. To test this hypothesis 140 girls
normally distributed and have a common of similar ability are split into two groups, with
variance, show that the unbiased estimate of this 68 attending classes containing only girls and 72
attending classes with boys. All the classes follow 1. A random sample of size n 1 is taken from population X_ N(p 0 2) and a rando , f . .
common variance is 361. Test whether there is ta 1cen from population y _ N(Jt , 0 2). !> m samp1eo s1ze n 2 1s
significant evidence, at the 5% level, of a the same syllabus and after a specified time the 2

difference in the mean recording times. Is the girls are given a test. The test results are (a) Obtain an unbiased estimate of a 2 by pooling the results from the two sam les.
difference significant at the 4% level? summarised thus: (b) Test the hypotheses stated at the level of significance indicated. _p
Girls in the mixed classes:
5. A large number of tomato plants are grown Lx ~ 7920, Lx 2 ~ 879 912 n, Lx L(x -x) 2 n, Ly L(y-y)'
under controlled conditions. Half of the plants, Hypotheses Level
Girls in single-sex classes:
chosen at random, are treated with a new Ly ~ 7820, Ly 2 ~ 904 808 (i) 6 171 83 7 164.5 112 Ho:ftt =pz
fertiliser, and the other half of the plants are 5%
treated with a standard fertiliser. Random Treating both samples as large samples from Ht:Pt >f..tz
samples of 100 plants are selected from each normal distributions having the same variance, (ii) 5 678.5 562.3 7 971.6 308.6 Ho:ftt =Jtz 5%
half, and records are kept of the total crop mass obtain a two-sample pooled estimate of the
of each plant. For those treated with the new common population variance. Test whether the Ht:#t *-ltz
results provide significant evidence, at the 1% (iii) 8 238.4 296 10
fertiliser, the crop masses (in suitable units) are 206 145 Ho:Pt-Pz =4 1%
summarized by the figures level, that girls reach a higher standard in single-
sex classes. Hl:tt 1 -tt 2 >4
Lx ~ 1030.0, Lx 2 ~ 11 045.59. (iv) 12 116.16 45.1 18 156.96 72
The corresponding figures for those plants 9. The mean height of 50 male students of a college Ho=Pr =pz 10%
treated with the standard fertiliser are who took an active part in athletic activities was H1: Pt *Pz
Ly ~ 990.0, Ly 2 ~ 10 079.19. 178 em with a standard deviation of 5 em while
50 male students who showed no interest in such 2. The heights (measured to the nearest centimetre)
Assuming that the variance of X is the same for
Treating the sample as a large sample from a of a random sample of six policemen from a
activities had a mean height of 176 em with a both types ?f golf ball, obtain a pooled (two
normal distribution, and assuming that the standard deviation of 7 em. Test the hypothesis certain force in Wales were found to be:
sample) estimate of this variance and test at the
population variances of both distributions are that male students who take an active part in 176, 180, 179, 181, 183, 179. S:O level whether his results for 'Gof~r' golf balls
equal, obtain a two-sample pooled estimate of athletic activities have the same mean height as drffer significantly from those for 'Farfly' golf
the common population variance. the other male students. The heights {measured to the nearest centimetre) balls. (C)
Assumjng that it is impossible for the new If both samples had been of size n, instead of 50, of a random sample of 11 policemen from a
fertiliser to be less efficacious than the old find the least value of n which would ensure that certain force in Scotland gave the following data: 4. ~r Mean notes the time, in minutes, that it takes
fertiliser and assuming that both distributions are the observed difference of 2 em in the mean hnn to drive to work in the mornings. The results
normal, test whether the results provide Ly ~ 1991, l:(y- y) 2 ~54. are:
height would be significant at the 1% level.
significant evidence (at the 3% level) that the (Assume that the samples continue to have the Test at the 5% level, the hypothesis that Welsh n 1 =8, l:x 1 =120, :Ex/=1827.
new fertiliser is associated with a greater mean same means and standard deviations.) (C) policemen are shorter than Scottish policemen. For his return journey in the rush hour Mr
crop mass, stating clearly your null and
Assume that the heights of policemen in both Mean notes that: '
alternative hypotheses. (C) 10. A random sample of 27 individuals from the forces are normally distributed and have a
population of young men aged 18 and of high commOn population variance. nz = J 0, l:x 2 = 230, :Ex}= 5436.
6. Mr Brown and Mr Green work at the same
intelligence have foot lengths (in centimetres, to He maintains that, on average, it takes him at
office and live next door to each other.
the nearest centimetre) as summarised below. 3. An expert golfer wishes to discover whether the least ten minutes longer to drive home.
Each day they leave for work together but travel
average distances travelled by two different
by different routes. Mr Brown maintains that his
Foot length (a) Using t~e results from the two samples, find
route is quicker, on average, by at least four brands of golf ball differ significantly. He tests
an unbrased estimate of the common
minutes. Both men time their journeys in minutes {in em) 24 25 26 27 28 29 30 each ball by hitting it with his driver and population variance.
over a period of ten weeks. The results obtained measuring the distance X (in metres) that it
{b) Assuming that the times of all journeys are
were: Number with this travels. The distribution of X may be assumed to
be normal. normally distributed, use the two-sample t-
foot length 2 3 9 6 5 1 test at the 5% level to test Mr Mean's daim.
MrBrown: n 1 =:50, .X 1=:21, s?=10.24 His results for a random sample of 9 'Farfly' golf
Mr Green: 11 2 =50, X 2 = 24, s/ = 7.84 Obtain the sample mean and show that the balls were X= 214 and :E(x- X) 2 = 2048. 5. Random samples of year ·1 0 pupils at two
Assuming that the times are normally distributed unbiased estimate of the population variance, His results for a random sample of 16 'Gofar' schools are given the same mathematics test. The
and that they have a common population based on this sample, is 2.00. Obtain a 96% golf balls were results are summarised thus:
variance, test at the 5% level whether Mr confidence interval for the mean foot length of x ~ 224 and L(x- x) 2 ~ 2460. School A: 11 1 =20, X=43, l:(x- x) 2 ~ 1296
Brown's claim can be accepted. this type of person. School R: n 2 = 17, y = 36, L(y- y) 2 ~ 1388
T
I
Assuming that the distributions of marks are It is desired to examine whether the average
normal with a common population variance, test volume of liquid delivered to a container by the
at the 2% level whether there is a significant
difference in the mathematical ability of the Year
machine is the same after overhaul as it was
before.
Summary
J 0 pupils at the two schools. (a) State the assumptions that are necessary for
the use of the customary t-test.
6. A random sample of size n 1 is taken from a {b) State formally the null and alternative
population P1 whose mean isp 1 and variance a/ hypotheses that are to be tested. ., For stages in a hypothesis test, see page 513
and a random sample of size n 2 is taken from {c) Carry out the t-test, using a 5% level of
population P2 with meantt 2 and variance a/. significance. ® For critical values and rejection criteria for a z-test see page 51.3
Under what circumstances is it valid to test the (d) Discuss briefly which of the assumptions in
hypothesis tt 1 - tt 2 ""0 using a two-sample t-test? {a} is least likely to be valid in practice and
why. (MEI)
0 Standardised test statistics:
A machine fills bags of sugar and a random
sample of 20 bags selected from a week's Test 1: Testing an unknown population mean /l H. p-
8. The performances of trainee actors who have ' o· - fto·
production yielded a mean weight of 499.8 g
passed through a drama school are rated by a When a 2 is known.
with standard deviation 0.63 g. A week later a
panel of experienced actors who assign an
sample of 25 bags yielded a mean weight of 1a X is normally distributed, X- N!.f.l, 0 2)
overall mark for each trainee. The drama school
500.2 g with standard deviation 0.48 g.
has recently introduced a new training method For samples of size n (any size),
Assuming that your stated conditions are which, it is claimed, will lead to better
satisfied, perform a test to determine whether the 2
performances.
mean has increased significantly during the The marks for a random sample of 6 trainees X- N(l'o, : )
second week. using the old training method were
Test whether the mean during the second week 243, 228, 220, 206, 230, 198. · . z X-fto
could be 500 g. (Use a 5% significance level for Test statistic = af;/n where z- N(O, 1).
both tests.) and the marks for a random sample of 8 using
the new method were
1b X is not normally distributed
7. A liquid product is sold in containers. The
containers are filled by a machine. The volumes 235, 259, 227, 242, 238, 253, 221, 217.
For large samples of size n, by the central limit theorem
of liquid (in millilitres) in a random sample of 6 Use an appropriate t-test to examine, at the 5% 2
containers were found to be: level of significance, whether there is evidence
that the new method has led, on average, to
X- N(l'o. : ) '
497.8, 501.4, 500.2, 500.8, 498.3, 500.0.
higher scores. State carefully the assumptions on
After overhaul of the machine, the volumes {in which this procedure is based. .. z X-fto
millilitres) in a random sample of 11 containers Provide a two-sided 95% confidence interval for T est statistic = a/;/n where z- N(O, 1).
were found to be the true difference in mean scores between the
old and new methods. State carefully the When a2 is unknown,
501.1, 499.6, 500.3, 500.9, 498.7, 502.1, interpretation of this interval. (MEl)
500.4, 499.7, 501.0, 500.1, 499.3.
lc
X is pr;~:(:o~~yally distributed. For large n,

.. Z X-l'o
Test statist:lc = &f{;, where z- N(O, 1).

1d X is normally distributed, X- NI,J.l, 2 For small n,


0 ).

.. T
T est statistic X-l'o
= &f{;, where T- t(n- 1).
Test 2: Testing a binomial proportion p, where X- B(n, p).
X is the number of successes in n trials.
If n is large such that np > 5 and nq > 5, then X- N(np, npq).
.. Z X-np
Test statiStic = - - where Z- N(O, 1).
Remember to Lt$C a continc;iry correction ( ± 0.5).
'lnpq
ill'
T
(b) Test at the 5% significance level whether there is evidence that the population mean time
Test 3: Testing p, _ p 2 , the difference between means of two normal distributions has changed from 21.75 seconds.

3a 2 2 k A technician who carried out the above test concluded with the following incorrect
;,·-a~2 ~n:w(r: -r<2, a,"+ al)
n1 nz
statement.
Give a corrected version.

..
Test stat1st1C Z ~
x,- x2 - (u, _,,2) where Z ~ N(O, 1).
'It is not necessary for the population to be normal since the sample size is large and the
central limit theorem states that any sufficiently large sample is normal.' (C)
2 2
a1 az
-+-
n1 nz Solution 11.16
2
3b Common population variance a known Let T be the time, in seconds, to check an item.

X1 - X2 ~NV'' -r<z, a ~1 + ~2 ))2


( (a) &2 ~ _1_
n-1
(~t2- (~t)z)
n

Test statistic Z
X, - X~ (u, - flz)
2- where Z ~ N(O, 1). 1 (24592.35---
1107 )
2
~-
.2_+.2_
(f 49 50
n1 nz ~1.7014 ...

Common population variance a 2 unknown ~ 1.70 (3 s.f.)


3c
2 2
5 (b) Let I' be the population mean time.
n1 1 + nzSz (s z, 522 sample variances)
1
n 1 +n2 -2 H 0 : p ~ 21.75 (the population mean has not changed)
~(x 1 - x 1) + ~(x2 - xzf
2 H 1: p * 21.75 (the population mean has changed)
n 1 +n2 -2 Since n is large, by the central limit theorem, Tis approximately normal, and
When n is large
f ~ N(f<, :").
Test statistic Z where Z ~ N(O, 1).
According to H 0, I'~ 21.75.
2
Since a is unknown, B2 is used instead.
When n is small
~ N( 21. 75, ~~0 )
1
f
Test statistic T where T ~ t(n 1 + n 2 - 2).

Carry out a two-tailed test at the 5% level and reject H 0 if Iz I> 1.96 where z~ t-r<
,,-·
ii/vn
~~
1107
From the sample, t ~- ~ - - ~ 22.14
n 50
22.14-21.75
z~ 2.114 ...
Miscellaneous worked examples .Yl. 70/-YsO
Since Iz I> 1.96, reject H 0 • -1.96 1.96
Example 11.16
There is evidence, at the 5% level, that the population mean time has changed from
An inspector of items from a production line takes, on a:erage, 21.75 seconds to ch~ck each 21.75 seconds.
item. After the installation of a new lighting system the times, t seconds, to check each of
50 randomly chosen items from the production line are summansed by Lt ~ 1107, The central limit theorem states that the distribution of means is approximately normal
for large sample size n.
~t ~
2
24 592.35.
(a) Calculate an unbiased estimate of the population variance of the time taken to check an NOTE: the variable in Example 11.16 was given as T. Do not confuse with the
standardised statistic in the t-distribution.
item under the new lighting system.
'"i'!J' !! 'rS) 55!

Example 11.17 (c) From part (a), H 0 is accepted if x < 16.05 .


2
The random variable X is distributed N(u, 3.5 ). A test of the null hypothesis p = 15 against P(Type II error)= P(Ho is accepted when H, is true)
the alternative hypothesis I'> 15 is required and the probability of a Type I error should be
2
0.05. A random sample of 30 observations on X is taken. H,:t<= 17, so under H 1, X_ N(17 3.5 )
' 30 .
(a) Find the critical region for the sample mean X.
The mean of the sample was 16.00. .. P(Type II error)= r(X< 16.05 when X_ N(17, 3~~ )) 2

(b) Find a 95% confidence interval for I'·


(c) Find P(Type II error) for the test in part (a) when I'= 17. =P(z < 16.05 -17)
3.5/130
The size of the sample is increased but P(Type I error) is still 0.05.
= P(Z < -1.487)
(d) State what effect this change will have on the critical value for X and on P(Type II error). = 1-0.9316
(L) = 0.0684
So P(Type II error) ~ 7%.

Solution 11.17 IPf(Tn is increased, but P(Type I error)= 0.05, the critical value for X will decrease
ype II error) w1ll also decrease. ·
P(Type I error)= P(H0 is rejected when H 0 is true)= 0.05.
This is illustrated in the following diagrams:
So the significance level of the test is 5%.
When n=30:
X- N(u, 3.5 2 )
Ho
H 0 :11= 15
H 1:p>15
under H 1
According to H 0 , X- N(15, 3.5 2 ) X- N(17 ' 330
· 5')

So, for a sample of size 30,


2
-
X- N ( 15,3o.
3.5 )

(a) Using a one-tailed (upper tail) test at the 5% level, reject H 0 if z > 1.645
------;--1':5~_ ____:1'-"6.05 17
x-15 Accept H0
where z 130
3.5/ 30 When n > 30, the curves are more squashed:
So the critical (rejection) region for xis given by

x-15 > 1.645


3.5/130
3.5 z, 0 1.645
I.e. X > 15 + 1.645 X.= X: 15 16.05
v30 ~
x > 16.05 (2 d.p.) critical region

(b) 95% confidence limits for p:


(J 3.5
X± 1.96 _,-= 16.00 ± 1.96 X-=
vn v30 I
15 ,/' 17
= 16.00 ± 1.252 .. . P(Type 2 error decreases) Critical value decreases
95% confidence interval = (16.00 -1.252 ... , 16.00 + 1.252 ... )
= (14.7, 17.3) (1 d.p.).
,, 553

Example 11.18 Example 11.19


When cars arrive at a certain T-junction they turn either right or left. Part of a study of road
usage involved deciding between the followmg alternatives. The volume of paint, in litres, in a randomly chosen two-litre can is denoted by the random
variable V. Ten random observations of V are taken, with the following results.
Cars are equally likely to turn right or left.
2.12 2.03 2.07 1.99 1.95 2.01 2.00 2.08 1.94 1.99
Cars are more likely to turn right than left. . ..
(a) State suitable null and alternative hypotheses, involving a probbafility, for.a Sl:mfl~a~~~;:~~ Assuming that V has a normal distribution, use a single-sample t-test, at the 10% significance
b Out of a random sample of 40 cars, n turned right. Use a smta e ap~ro~Im.a .ton ° level, to test the claim that the mean volume of paint in two-litre cans is not 2 litres.
( ) 1 ast value of n for which the null hypothesis will be rejected at the 2 Yo SlgrufJCance leveL After installation of a new machine for filling the cans, there were worries that, on average,
(c) ;or the test described in (b), calculate the probability of making a Type II error when, 17C) the new machine was dispensing less paint than the old machine. The volumes of paint
fact, 80% of all cars arriving at the JUnctwn turn nght. dispensed by the new machine in a random sample of 20 cans were measured, and the results
are summarised by
Solution 11.18 ~w=38.1 and ~(w- w) 2 = 0.060 40,
Let X be the number of cars in 40 that turn right. Then X- B(40, p).
where w is the volume of paint, in litres, in a can. Use a two-sample t-test to test whether
(a) H 0: p = 0.5
there is evidence, at the 10% significance level, that the new machine is dispensing less paint
H 1: p > 0.5 than the old machine.
(b) According to H 0 , X- B(40, 0.5) _ = 5
Now n is large and np = 40 x 0.5 = 20 > 5, nq- 40 x 0 ·5 20 > · State one condition required of the two populations for the two-sample t-test to be valid. (C)
Since np > 5, nq > 5, use the normal approx1mat10n
Solution 11.19
X- N(np, 11pq), with npq = 40 x 0.5 x 0.5 = 10
1.e. X- N(20, 10). . Let V be the volume (in litres) of paint in a can.
Use a one-tailed (upper tail) test at the 2% level and reject H 0 if P(X :<> n) < 2%. W1th the V- N(!t 1, a 2 ) with 11 1 and a 2 unknown.
continuity correction this becomes P(X > 11- 0.5) < 2%.
n- 0.5 -20)
Sample readings: n = 10, iJ = 2.018, s," = 0.002976 (using calculator)
i.e. reject H 0 if P Z > ill < 0.02. n 10
( An unbiased estimate of a 2 = 8 2 = - - s 12 = - (0.002976) = 0.0033
n-1 9
From tables, P(Z > 2.055) = 0.02, :. iJ = 0.0575
11-0.5-20
so -=
v10
> 2 .055
0 2.055
Ho'l'' =2
Hl'l'' *2
11 > 20.5 + 2.055 X ill 20 27
According to H 0, the standardised statistic Tis such that
n > 26.998 ...
Since n is an integer, least value of n = 27. V-2
T = - y and T- t(9).
:. H 0 is rejected if X:<> 27. iJ/vn
(c) H 0 :p = 0.5 Carry out a two-tailed test at 10% level.
H 1: p = 0.8 Reject H 0 if It I> 1.833 (see tables on page 650, v = 9, p = 0.95).
X- B( 40, 0.8)
np=40x0.8=32>5,nq=40x0.2=8>5,. . 40 08 02-64 t 2.018-2 0.989 ...
so, using the normal approximation, X- N(np, npq) Wlth npq = X • X • - • 0.0575/ill
X- N(32, 6.4) .
P(Type II error) = P(H0 is accepted when H 11s true) .
Since It I< 1.833, do not reject H 0 •

j (=\
\
H 0 is accepted jf x < 27, i.e. if x < 26.5 (contmmty correctiOn) There is not enough evidence to say that the mean volume is not two litres.

so P(Type II error)= P(X < 26.5 when X- N(32, 6.4)) Let W be the volume, in litres, dispensed by the new machine.
26.5- 32) \ Assume that
2
W has a normal distribution and the two samples have a common population
= P( Z < ?(Type II error) "\ variance a • This time the two samples are considered to give a two-sample estimate of a 2 •
16.4 ""
-~L./"...L-~'---~-
,, .. z nJs12+nzSz2
302
=P(Z < -2.174) a
= 1 - 0.9852 ~· -~ 6ii4 n 1 +n2 -2
= 0.0148
~ 1.5%
T

where s," ~ 0.002 976 (from first part of question) (b) Fin~ a critical region for a 5% significance
test 111 the form, me~n X used to test the null hypothesis fl = 65
I;(w- w) 2 a~an:st the alternative hypothesis p > 65, where
and s/ sample mean X > k,
nz fl_CIS the population mean temperature of
where the value of the constant k is gi discharged water. It may be assumed that the
n 2 s 22 ~ I;(w- w) 2 ~ 0.060 40 , rrect ~o two d
co '
ecunal places. ven
population standard deviation of xis 5.0.
10 X 0.002 976 + 0.060 40 {c) State, With a reason, your conclusion for the
so a' 10 + 20-2 test when the mean speed calculated from
(a) State, in the context of the question, what
you understand by
the sample was 35 m.p.h. {i) a Type I error
~ 0.003 22
(d) Calculate the power of the test when, in (ii) a Type II erro;,
a~ 0.0567 (3 s.f) fact, I'~ 40. (NEAB)
(b) The probability of a Type I error is fixed at
H 0 : /). 1 - f'z ~ 0 (the machines dispense the same amount) 4 · A supermarket's statistician reports that, over the 0.1: Show that the range of values of X for
H 1 : f't - f'z > 0 (the new machine is dispensing less than the old) past three months, the mean amount spent per wh:ch the null hypothesis is rejected is given
customer has been £43 with a standard deviation by x > 66.01, correct to two decimal places
v-W-o where T - t(n 1 + n 2 - 2) of£20. (c) State the conclusion of the test when ·
The test statistic T X= 65:7, a-?d the type of error that might be
The supermarket carries out a promotion for one
week by offering 'buy two ... get one free' on a made m this case.
range of products which it sells. The management (d) Calculate the probability of making a
hopes that this will increase the mean amount Type II error when, in fact, Jt = 68.
T where T- t(28). spent per customer; you may assume that the Wh~t can be deduced about the probability of
standard deviation remains unchanged. makmg a Type II error when, in fact, p > 68? (C)
A random sample of 50 customers visiting the
supermarket that week spent a total of £ 2400 . 7. !he manager of a large supermarket wishes to
iJ- fD (a) Write down suitable null and alternative Judge the effect of a new layout on the
Carry out a one-tailed test, at the 10% level. Reject H 0 if t> 1.313, where t hypotheses in order to test whether or not ~ustomers. On the day that the layout was
the promotion has increased the average mtroduced the first 200 customers in the store
level ~£ spending per customer. were asked whether or not they approved of the
new layout.
(b) Explam ~arefull~ the use of the Central limit
theorem m carrymg out this hypothesis test. Comment on the manner in which the sample
{c) ~ar~~ out the hypothesis test at the 5% was cho~en, and suggest a way of obtaining a
more smtable sample.
sigmfic~nce level, dearly stating your
conclusion. Out of a suitably chosen sample of 200
(d) Find a 90% confidence interval for the mean customers, 148 approved of the new layout.
am~:mnt spent by customers during the ~alculate an approximate 95% confidence
Since t > 1.313, reject H 0 • penod of the prom.at~on. State, giving a mterval for the population percentage of
reason, _wh:ther this IS consistent with your customers who approve of the new layout.
There is evidence at the 10% level that the new machine is dispensing less paint than the old conclusiOn 111 (c). (MEl) The supermarket manager claims that 80% of
machine. customers approve of the new layout. Show that
5. T_he process of manufacturing a certain kind of the · ·£·
The condition required: The two populations must be normal with common variance. 1 1data provide evidence
. at the 2lot
2 to s1gm lCance
dmner plate results in a proportion 0.13 of faulty ev~ that the populatwn percentage is less than
plates. An alteration is made to the process 80~. (C)
which is intended to reduce the proportion of
Miscellaneous exercise lle faulty plates. State suitable null and alternative 8. T~e random variable X has a normal distribution
hypotheses for a statistical test of the Wlth mean fl {unknown) and variance 0 1
{a} State an assumption necessary for these 500 effectiveness of the alteration. (known). To test the null hypothesis Ho: fl ""fto
1. The amount of nicotine, in milligrams, in a
cigarette of a certain brand is normally children to be considered as a random In order to carry out the test, the quality control a random sample of n observations of X is take~
distributed with mean fl and standard deviation sample of the population of all children. department count the number of faulty plates in and the sample mean is X. Find, in terms of fl a'
2.5. A random sample of 10 cigarettes yielded a (b) Test at the 10% significance level whether a random sample of 2500. If 290 or fewer faulty and n, the set of values of X which will result i~
mean nicotine value of 18.4. Obtain a symmetric the data indicate that boys and girls are not plates ~refound then it will be accepted that the each of the following:
90% confidence interval for ft, giving values to equally likely in the population. (C) alteratwn does result in a reduction in the
~rop~rtion of faulty plates. Calculate the
*
(a) Ho being rejected in favour of H 1 :p p at
0
three significant figures. the 5% level of significance
Give a reason why the value of p might not be 3. A resident of an urban road claims that the sigmficance level of this test, using a suitable {b) Ho not being rejected in fav~ur of H 1 :p <p
average speed of vehicles using the road is normal approximation. at the 1% level of significance.
0
inside this interval.
Test the null hypothesis ft = 17.8 against the greater than the 30 m.p.h. speed limit. To Caku!ate the probability of making a Type II
*
alternative hypothesis p 17.8 at the 10% investigate this claim the police time a randomly error ~~ the above test, given that the alteration 9. The masses of components used in making a
significance level. (C) selected sample of 25 vehicles over a measured results 10 a decrease in the proportion of faulty model car are being checked. Each of a random
mile on the road. It is assumed that the speeds plates to 0.11. (C) sample of 200 components is weighed and the
2. A study is made of the numbers of boys and girls calculated from their observations come from a masses, x g, are summarised by
normal distribution with mean ft m.p.h. and 6. ~ater from a cooling tower at a power station is
in families. A random sample of families is
discharged into a river. In order to test whether n ~ 200, l:x ~ 1484.2, l:x 2 ~ 11 098.19.
chosen. The total number of children is 500, of standard deviation 12 m.p.h.
the mean temperature of discharged water is (a) Calculate an unbiased estimate of the
whom 261 are girls. It is desired to test the null (a) State appropriate null and alternative
hypothesis that boys and girls are equally likely greater than the permitted maximum of 65 oc population variance.
hypotheses for a significance test. the temperature (x oq of 40 randomly selected (b) Sta.te what you understand by 'unbiased
in the population against the alternative estimate'.
hypothesis that they are not equally likely. samples of water will be taken and the sample
T
t:--, /..([;··if<_;[~:: 557

The components are produced in large batches. It measured, the results being summarised by: 15. A study of the annual rainfall x em to th
is desired that the mean mass of components in a :Ex~ 2092.0 and :Ex 2 ~ 24 994.5. nearest centimetre, over the l;st 20 years ~or a I know that the he_igh!s, in metres, of men in
batch should be at least 7.40 g. In order to (a) Calculate, to four significant figures, small town gave the following results: general have the dtstnbution N(1.73 0 082) I
make the assumption that the hei h ' . .
decide whether to accept or reject a batch each of
a random sample of 50 components from the
unbiased estimates of
(i) the population mean distance, fl miles,
:Ex~ 1325, :Ex'~ 90 316. o~ m~le basketball players are als~ :~'r!~~etres,
batch is weighed. The sample data is used to of the houses from the station, (a) Fin? unbiased estimates of the mean and the dtstnbu~ed, with the same variance as the ~i
vanance of the annual rainfall f or th.IS town of men tn general, but possibly with a I ghts
perform a test of the null hypothesis fl"" 7.40 (ii) the population variance of the distances mean. arger
against the alternative hypothesis fl < 7.40, where of the houses from the station. ~~chive reco~ds sho~ that the annual rainfall f .
11 g is the mean mass of components in the batch. State what you understand by the term t Is town, pnor to th 1s period had I or (a) Write down the null and alternative
For the test the population variance is taken to 'unbiased estimate'. f 62 50 d ' a mean va ue hypotheses under test.
o . em an a standard deviation of
have the value found in part (a). The batch is (b) Using the sample data, a significance test of 11.45 em. 1 propose to base my test on the heights of ei ht
rejected if the null hypothesis is rejected using a the null hypothesis 11"" 10 against the (b) Assuming that the standard de . . male basketball players who recently gd
21% significance level. Show that the batch will alternative hypothesis ft > 10 is carried out . viatwn fo 1 1 appeare
remams unchanged at 11.45 em t t h . r ?~r oca team, and I shall use a 5% 1 1 f
be rejected if the sample mean mass is less than at the a% significance level. In the test, the So/c 1 I of Sigm. .. , es at t e sigmficance. eve o
7.22 g. sample mean is compared with the critical th o eve
. 'd 1ICance wheth er or not
For one such batch the sample data is value of 10.65; as the sample mean is less ere IS e~t ence of an increase in mean (b) Write ':!_own the distribution of the sample
summarised by n =50, Lx"" 366.0. than 10.65 the null hypothesis is not annual ram fall over the last . ,State
, 20 yeats. mea~, X_, fo~ samples of size 8 drawn from
rejected. Calculate the value of a. your hypotheses clearly. {L) the distn?u.tiOn of X assuming that the null
Determine whether or not this batch is rejected. hypothesis IS true.
(c) Give a reason why it is not necessary for the
Calculate the probability of making a Type II
distances to be normally distributed for the 16. I~/97~!dhe Borsetshire County Council tree (c) ?eterm~ne the critical region for my test,
error in carrying out the above test for a batch o tcer I a survey of a random sample of 64
test to be valid. (C) Illustratmg your answer with a sketch.
whose mean mass is actually 7.10 g. (C) separate areas, ea:h 1 km square, and found an {d) C~rry out the test, given that the mean
13. A particular investigation concentrated on people averag~ of 19.5 diseased trees per square. The height of the eight players is 1.765 m. You
10. A box of dice contains some which are unbiased recently re-employed following a first period of followmg year, to test whether the disease had sho~ld present your conclusions carefully
and some which are biased in such a way that spread, she took a new random sample of 36
unemployment. Each of a random sample of 50 statmg any additional assumption you ne~d
the probability of throwing a six with one of separate areas, also each 1 km square and f 1 to make.
such persons was asked the duration, in months,
these dice is t. One die is selected at random an average of 21.7 diseased trees per ~quare~unc
of this period of unemployment. A summary of In fact, the distribution of X is N(1.80, 0.062).
from the box and, in order to decide whether it is
the results is as follows. (a) A_ssume that, in both years, the number of
biased, it is thrown 240 times and the number of 2 (e) Find the probability that a test based on a
sixes, N, is counted. The probability of throwing mean= 16.7 months, variance= 193.21 month d~se~sed .trees per 1 km square had a normal
distnbution Wtth population variance 18 2 ~a:l?om sa~pl? of size 8 and using the
a six with this die is denoted by p. The null Investigate at the 5% level of significance the ~.:ntical ~·egton m part (c) will lead to the
hypothesis p"" ~is tested against an appropriate claim that, for people re-employed after a first
Test, at t?e 1% significance level, the ..
conclusiOn that male basketball PIayers are
alternative hypothesis at the 5% significance hypothesis that the mean number of
period of unemployment, the mean duration of not ta IIer than men in general. (MEl)
level. dtseascd trees per 1 km square in 1979
unemployment is more than 12 months. th · was
Indicate why, in carrying out your test, no e same as m 19 78, against the hypothesis 18. The in~redients for concrete are mixed together
(a) State an appropriate alternative hypothesis. t h at the mean number had increased
assumption regarding the distribution of the to obtam a mean breaking strength of
(b) Find the set of values of N for which it is {b) F~trther evidcn_ce suggests that the n~mber of
accepted that the die is biased. duration of the first period of unemployment is
dtseased trees IS not normally distributed. ~000 newtons. If the mean breaking strength
(c) Find the probability of making a Type II necessary. (NEAB) rops below 1800 newtons then the composition
~ty what changes you might have to make
error in the test. [<!>(3.355) ~ 0.99961 (C) I any.' ~o the test you have carried out, ' must be ~hanged The distribution of the breaking
14. The error in the readings made on a measuring sti:ength IS normal with standard deviation <

instrument can be modelled by the continuous explammg the reasons for your answer. Do 200 newtons.
11. A manufacturer makes two grades of squash not carry out any further tests. (C)
ball: 'slow' and 'fast'. Slow balls have a 'bounce' random variable X which has mean ft and Samples are taken in order to investigate the
(measured under standard conditions) which is standard deviation a. If the instrument is 17. :hen watching games of men's basketball I hypotheses:
known to be a normal variable with mean 10 em correctly calibrated then ft = 0.
In order to check the calibration of the instrument, . ave noticed that the players are often tall' 1 H 0 : p. = 2000 newtons
and standard deviation 2 em. The 'bounce' of
the errors in a random sample of 40 readings were mterested to find out whether or not m:n ~h~m HI :ft = 1800 newtons
fast balls is a normal variable with mean J 5 em play basketball really are taller than men in
and standard deviation 2 em. A box of balls is determined. These data are summarised by: How many samples must be tested so that
general.
unlabelled so that it is not known whether they l:x ~ 120, :Ex 2 ~ 3285. P(Type I cnor)) ~ 0.05 and
P(Type II error) ~ 0.1?
are all slow or all fast. {a) Estimate a 1 .
Devise a test, based on an observation of the (b) Carry out a hypothesis test, at the 5% level of
mean bounce of a sample of four balls from the significance, to test whether the machine is, or
box such that the Type I error is 0.05 and state is not, correctly calibrated. You should state
the magnitude of the Type II error for this test. your hypotheses and conclusions carefully.
(C) (c) Obtain a symmetric 95% confidence
interval for 11> explaining why it is only
12. An ambulance station serves an area which approximate.
includes more than 10 000 houses. It has been (d) Suppose the data from the 40 readings had
decided that if the mean distance of the houses been such that the estimate of a 2 as found in
from the ambulance station is greater than ten part (a) was larger, but without changing
miles then a new ambulance station will be the sample mean. State the effect this would
necessary. The distance, x miles, from the station have on the value of the test statistic in part
of each of a random sample of 200 houses was (b). Explain why this might affect the
conclusion to part {b). (MEI)
558 T ! ! 'if--·( I ;.<f- :-)

I
Test llA (z-tests) (b) State ~hat 'a Type II error has occurred'
For th~ last 15 Number One records featuring
means m the context of the playing tim f
tapes. es o male smgers, the data are:
1. Cans of lemonade are filled by a machine which (b) The tar yields in cigarettes of a particular
is set to dispense a mean amount of 330 ml into brand are distributed normally with mean (c) Calculate the probability of making a 1, 1, 2, 2, 2, 3, 4, 2, 1, 2, 3, 5, 1, 2, 3.
each can. The manufacturer suspects that the p. mg and standard deviation 0.8 mg. In Type II error when, in fact, p = 59. 7 . (C) A music industry producer wished to test
machine is tending to over-dispense and, in order order to test H 0 :p= 17.5 against
to test the suspicion, measures the contents, x ml, H 1 : ft > 17.5 at the 1% level of significance, 4. The t~p ~0 chart of the Recorded Music whether there was any difference in the time
Assonatwn has been compiled every week for spent at Number One between female and male
of a random sample of 30 cans. The results arc a random sample of 10 cigarettes of this
some years, and the standard deviation of the smgers. ~he assumes that both the distributions
brand is to be obtained and the sample
summarised by:
number of weeks which a record spends at from wh,~h the two samples are drawn are
mean X calculated.
~x ~ 9925, ~x 2 ~ 3 284 137. Number One in the chart has been found to be normal With standard deviation 0.87 weeks.
(i) In the case when the yields were
(a) Calculate an unbiased estimate of the recorded, in milligrams, as: 0.87 weeks. The number of weeks which the last (a) State the null and alternative hypotheses she
population variance of the amount ten N~mber One records featuring female singers must use.
17.1, 18.3, 18.9, 17.8, 16.9, 19.2, spent m the Number One position are: (b) <?ar~~ out the test at the 5% level of
dispensed into each can. Give four
17.8, 18.3, 18.5, 18.2 sigmftcance.
significant figures in your answer. 3, 1, 1, 2, 1, 2, 3, 2, 1' 1.
(b) Test the manufacturer's suspicion at the carry out the required significance test. (c) Give a ~eason why her assumption of
10% significance level. (ii) Determine a critical region for the test normality may be invalid. (L)
(c) Indicate where the central limit theorem is in the form, X > c, where c is a
used in the test, and state why the use of the constant whose value is to be
central limit theorem is necessary. (C) determined.
(iii) Calculate the size of the Type II error Test llC <t-tests)
2. The proportion of patients who suffer an allergic for this test when, in fact, fl = 18.0.
reaction to a certain drug used to treat a (NEAB) 1. Six cleaning firms were selected at random and
asked abo~t their hourly rates of pay, $x, ~ith (a) Calcula~e an unbiased estimate of the
particular medical condition is assumed to be
the followrng results: populatiOn variance.
0.045. 4. The random variable X is distributed as
When 400 patients were treated, 25 suffered an N(,Lt, 16). A random sample of size 25 is (b) Is the:e evidence at the 10% level that the
7[.00, 6.80, 6.62, 6.94, 7.48, 7.04 machme is issuing too many crisps per
allergic reaction. Using a normal approximation, available. The null hypothesis ft = 0 is to be Lx~41.88, ~'~292.74.]
test at the 5% significance level whether the tested against the alternative hypothesis fl * 0. packet? State any distributional assumptions
made.
quoted figure of 0.045 is an underestimate. (C) The null hypothesis will be accepted if Carry_ out a t-test at the 1% significance level to
-1.5 <X < 1.5, where X is the value of the est.abhsh whether the mean hourly rate of (c) How v:rould t~e test procedure in part (b)
. y cIeanmg
P~l db · 1·mns, falls below a proposed~h have differed If the population variance was
3. (a) A null hypothesis H 0 is to be tested against sample mean, otherwise it will be rejected.
mtmmum of $7.40. known? (AQA)
an alternative hypothesis H 1. Explain what Calculate the probability of a Type I error.
is meant by: Calculate the probability of a Type II error if in State ~lear~y an assumption made in applying the
t-test m thts context. (C) 4. The
· . customers
d of a local branch of a bancare
1
{i) a Type I error, fact ft = 0.5; comment on the value of this
mvi~e to c~mment on various aspects of the
(ii) a Type II error. probability. (MEI)
2. On a certain day in July the maximum servtce. Their comments are translated · t
, "f. moan
temperature, m oc, was recorded at 11 points avera 11 satts actiOn score'. This score can be
chosen at random on the island of San Marco taken a~ normally distributed over the whole
0~ the same day the maximum temperature, . popu1atwn of customers.
P C, was recorded at 20 points chosen at A s_taff training programme has recently been
Test llB (z-tests) random _on the island of San Polo. The results are completed. A random sample of scores before the
summansed by: programme was as follows:
1. A certain brand of mineral water comes in A spokesman for the government stated that the
m~25.30, p~26.45, 126, 93, 114, 107, 98, 112.
bottles. The amount of water in a bottle, in results showed that 40% was too low.
L(m- m)' ~ 16.74, L(p- 151' ~ 15.29. A separate random sample of scores after the
millilitres, follows a normal distribution of mean Stating the null and alternative hypotheses, test
p. and standard deviation 2. The manufacturer at the 5% level which of the spokesmen was Test, at the 2.5% significance level, the claim programme was as follows:
claims that,Lt is 125. In order to maintain justified in his assertion. (L) that S~n. Marco was cooler than San Polo on that 124, 107, 117, 136, 120, 122.
standards the manufacturer takes a sample of 15 day, gtv:ng your null and alternative hypotheses
bottles and calculates the mean amount of water 3. The playing times of a particular brand of audio and statmg any assumptions necessary for your Test at t~e 5% level of significance the null
per bottle to be 124.2 millilitres. Test, at the 5% tape are normally distributed with mean test to be valid. (C) hypot~e~ts that the mean score is the same after
level, whether or not there is evidence that the p minutes and standard deviation 0.24 minutes. the
h trammg
1 . programme as it was· before agamst
·
3. The contents of a packet of crisps are marked as t e :'1- ternative that t~e new mean score is higher,
value of p is lower than the manufacturer's The manufacturer states that ft = 60. A large
30 g ..The .manufacturer believes that one of their statmg ~our as~umptwn concerning the
claim. State your hypotheses clearly. (L) batch of these tapes is delivered to a store and, in
order to check the manufacturer's statement, the
machmes IS faulty and is issuing too many crisps und~rlymg ~anances. Provide a two-sided 99%
per packet. A sa~nple of 10 packets is selected at confidence Interval for the true difference in
2. A newspaper headline stated 'Majority would playing times of a random sample of ten tapes ran_dom from thts machine and the masses of
vote for Prime Minister'. The article explained were measured. The null hypothesis ft = 60 is mean scores. (ME!)
thetr contents were:
that in a survey of 70 randomly selected people, tested against the alternative hypothesis p < 60 at
38 had said that they would vote for the Prime the 1% significance level. 31.5, 28.9, 30.5, 32.2, 35.5,
Minister. A spokesman for the opposition party 34.2, 31.8, 32.8, 29.1, 32.1.
(a) Find the range of values of the sample mean
said that such evidence was inconclusive, and, X for which the null hypothesis is rejected,
according to standard statistical techniques, the giving two decimal places in your answer.
result was consistent with only 40% of the whole
population voting for the Prime Minister.
Following the pattern for hypothesis tests established in earlier chapters, you assume that the
null hypothesis is true and then calculate the frequencies that you would expect to occur based
on this assumption. These are denoted by E or (,. These expected frequencies are then
compared with the actual (or observed) frequencies, denoted by 0 or f".

A test statistic involving 0 and E is calculated. This is often written X 2 and, subject to certairr
conditions, it can be approximated by a x2 distribution. Before looking in detail at how the
xtest
2 statistic is calculated and how to perform the test, consider some of the features of the
distribution.

The X2 significance test The x2 distribution


2
The x distribution has one parameter, v, pronounced 'new', and the shape of the distribution
In this chapter you will learn is different for different values of v. Here are some examples.
f(x)
= 1 f~)
about the x
2
distribution
1'

how to use Xz tables an d war k out the number of degrees


. of freedom, v V"'=3

how to pe rfarm a Xz goodness-of-fit


. . test for the followmg
Test 1: a uniform distnbut1on .
Test 2: a distribution in a given ratio
Test 3 , a binomial distribution
Test 4 , a Poisson distribution Some features of the x1 distribution are:
Test 5: a normal distribution . bles using a contingency table (a) It is reverse ]-shaped for v ~ 1 and v ~ 2.
@ how to pe rf orm a Xz test for independence between van a ' (b) It is positively skewed for v > 2.
about applying Yates' continuity correction when v ~ 1 (c) The larger the value of v, the more symmetric the distribution becomes.
(d) When vis large, the distribution is approximately normal.

Background knowledge 'll d t know how to calculate probabilities for the


For;he gobot_ndon::,:a~f' ~~i::~~ ~~~~ :~,:;:rdis~ributions.
unt orm,
1
Degrees of freedom, v

The parameter v is known as the number of degrees of freedom and it is the number of
THE X2 SIGNIFICANCE TEST independent variables used in calculating the test statistic. Details of how to find v are given ill
the following text and in the summary table on pages 579 and 590.
. .
There are two main situatiOns w h en a x2 significance
test is used:
Critical values and levels of significance
1 A 2 goodness-o f-fIt
i~
. tes t . d ant to know h ow we11 a particular
.
T.his used when you have some practtcal data an yloumwodels that data. The null hypothesis The XI test is conducted as a one-tailed (upper tail) test. When carrying out the test, you will
h b. om~al or a norma' I .
statistical distribution, su~ ~s a . m does rovide a model for the data; the a ternatJve want to know whether the calculated value of the test statistic lies in the main bulk of the XI
Ho is that the particular dlstnbutlon p
distribution or whether it is in the upper tail critical (or rejection) region. The boundary of the
hypothesis H, is that tt does not. . . critical region is called the critical value.
. d d (or for assoClatmn). l d want to
2. A XI test form epcn ence . 1 d t concerning two variab es an you The null The critical value depends on the level of significance of the test. Often a 5% or a 1% level of
h h ve some practlca a a . . b n them.
This is used w en you ad d tor whether there is an assocmtmn etwee . hat they arc significance is used and the critical values can be found from x2 tables. For example, for a 5%
know whether they are m epen en . d dent· the alternative hypothesiS H' lS t level of significance, the critical value is such that 5% of the area is in the upper tail and the
h ypo thes ·Is H o is that the factors are m epen ' critical value is written x25 % (v), for a particular value of v.
not.
10% level 5% level 1% level
l l l
v 0.990 0.975 0.950 0.100 0.050 0.025 0.010 0.005
1 0.000 0.001 0.004 2.705 3.841 5.024 6.635 7.879
2 0.020 0.051 0.103 \4.605\ 5.991 7.378 9.210 10.597
3 0.115 0.216 0.352 6.251
critical region \7.815\ 9.348 11.345 12.838
4 0.297 0.484 0.711 7.779 9.488 11.143 13.277 14.860
h . H is rejected in favour of
If the test value lies in the critical region, then the null hypot ests o
Examples (highlighted in the extract) x2{v} The tables
the alternative hypothesis H,. give the
(c) For a significance level of 5% and 3 degrees of freedom, upper tail
x2 5%(3) ~ 7.815. probability

x2 tables T f (d) For a significance level of 10% and 2 degrees of


freedom, x' 10%(2) ~ 4.605.
d ou ~ust make sure that you are amttar Critical
Xz tables are usually set in ~ne of ~o ~orma:~:~i:ation. In each case the rows refer to the value
with the format that you will be gtven man
value of v. The test statistic X2
2
The test statistic X uses the values of the observed (0) and expected (E) frequencies.
. 2 tables giving lower-tail probabilities
Forma t 1 · X · ·f· JS I,
(()
.n the lower tail. For a 5 o/o stgm tcance
In this format the column beadings indicate th~ ~~ar~ would be 9 5% in the lower tail, so look
}
2
level since there would be 5% m the upper tat' e The X distribution can be used as an approximation for the distribution of X 2, provided
for the column headed 0.95. none of the expected frequencies (E) falls below 5.
5% level 1% level You would write X 2 - x2 (v).
l l X 2 is calculated as follows:
0.95 0.975 0.99 0.995 0.999
0.01 0.025 0.05 0.9
p l. Find the difference, 0 - E, for each set of values.
3.841 5.024 6.635 7.879 10.83
0.0001571 0.0009821 0.003932 2.706
v~1
4.605 5.991 7.378 \9.210\ 10.60 13.82 2. Square each difference to obtain (0- E) 2 • This gives due weight to any particularly large
2 0.02010 0.05064 0.1026 differences and also means that all the values are positive.
7.815 9.348 11.34 12.84 16.27
0.1148 0.2158 0.3518 6.251
3 14.86 18.47
11.14 13.28 2
7.779 \9.488\ Divtde (0- E) byE for each set of values to obtain (0-
E E)
0.7107 2
4 0.2971 0.4844 3.
The tables give
Examples (highlighted in the extract) the lower tail
probability This has the effect of standardising that element. In this way, for example, a small
. ..
(a) For a stgmftcance 1eve1 o f 5% and 4 degrees of freedom,
difference will be more important when the expected frequency is small than when the
. 1 x'
the cnttca1 va ue, s% (4) ~ 9.488.
. expected frequency is large.
0 x2(v}
. .. 1 1 f 1% and 2 degrees of freedom,
(b) For a stgmftcance eve o
the critical value X t%( 2 ) ~ 9 ·210 ·
2 4. Finally, find the sum, ~ (O ~E)'. The smaller this quantity is, the better the fit.
The following example shows how to calculate X 2 for given data in a goodness-of-fit test.

Format 2: X2 tables giving upper-tail probabilities


PERFORMING A x2 GOODNESS-OF-FIT TEST
(This is the format printed on page 651.) .1 . th y give the significance
In this format the columns indicate the area in the upper tal ' I.e. e
Random numbers consist of lists of the ten digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and are such that
level of the test. .. 1 1 h that 5% of the area lies in each digit has an equal chance of appearing at any stage. Each digit, therefore, has a
For example, the co1umn h eaded 0 .OS gives the cnttca va ue sue probability of 0.1 of occurring, i.e. P(X ~ x) ~ 0.1. This is the discrete uniform distribution
(see page 270).
the upper tail.
564

. h d m number key IRan#l on a calculator it is possible to generate a


By pressmg t e ran o 0 000 d 0 999 For example At first glance the observed frequency for the digit 3 seems much too high and that for the
random three-digit number between . an . . digit 0 seems much too low.
IRan#! 0.593 IRan#! 0.194 IRan#l 0.106 and so on. 2

In this case the random digits are 5, 9, 3, 1, 9, 4, 1, 0, 6 The X test compares each observed frequency with the corresponding expected frequency.
(0- E) 2
Here are 100 digits generated on a calculator.
For each pair, calculate E , then calculate the sum to give the test statistic
4 9 8 3 3 3 7 1 3 9
9 91 86 1 1 6
3 6
0 7 7 3
3 3 7 3
5 4
3 8 1 4 2 8 8 6 1 9
~
2

4 5 3 4 9 4 3 8 5 5 If X 2 0, then there is exact agreement between the observed and expected frequencies.
8 6 6 7 5 9 2 6 3 3 If X > 0, then 0 and E do not2 agree exactly; the larger the value of X 2 the greater the
3 8 2 4 8 4 1 9 8 4 discrepancy. A low value of X implies a good fit, whereas a high value of X 2 implies a
poor fit.
1 4 2 2 1 7 0 8 2 5
7 5 8 0 4 7 6 9 1 2 For the above data,
9 7 7 5 3 7 4 0 6 6 2
xz (4 -10)
-'----c-c-'-- +
c1o -lW
+
(7 -10J' c1o -1W
+ ... + -----c-::--- 9.4
Ax' goodness of fit. test IS. used to test w hether
1
the numbers generated on the calculator are
h d t
random enough. To make It easier to ana yse t e a a, ar
range the digits in a frequency table: 10 10 10 10
The calculations are usually summarised in a table:
Digit 012 3 4 56 7 8 9
0 (O-E)2
Frequency 4 10 7 16 12 8 10 11 12 10 Total100 E
E
Make null and alternative hypotheses as follows: 4 10 3.6
10 10 0
Ho: the digits are random
7 10 0.9
HI: the digits are not random. . . .
16 10 3.6
X' ~ L (0 - E)2
Then calculate the frequencies that you would expect if the digits are random. 12 10 E
0.4 ~9.4
Expected frequency for each digit is 100 x 0.1 ~ 10 d 8 10 0.4
10 10 0
A dd anot h er row t o the table so that the observed frequencies (0) and the expecte
11 10 0.1
frequencies (E) can be compared.
12 10 0.4
Digit 0 1 2 3 4 5 6 7 8 9 10 10 0
Observed frequency (0) 4 10 7 16 12 8 10 11 12 10 Total100 1:0 ~ 100 I:E ~ 100 9.4
Expected frequency (E) 10 10 10 10 10 10 10 10 10 10 Total100
To decide whether the data give a good fit, you need to know whether 9.41ies in the main
The frequencies can be illustrated by a vertical line graph: body of the distribution or whether it is in the critical (rejection) region in the upper tail. If it
lies in the critical region, reject H •
Distribution of 100 random digits generated on a calculator 0

g- 16
Expected frequencies x
The boundary of the critical region is found from the appropriate 2 distribution which
~:-"--~"->' Observed frequencies depends on the number of degrees of freedom, v, the number of independent variables used in
~ 14 calculating the test statistic. It is found as follows:
g
G: 12
I
IO
8
l
l '
I '
I
I
I ' I '
I '
I
I
I
I
I I
The number of classes is 10 and there is one restriction (that the total of the expected
frequencies is 100), so v ~ 10- 1 ~ 9. Consider the x2 (9) distribution.
'' ' ' ''' I:
l I I I I l
j I I I I I j
6
I I I I I
4 I I I I I• '
I
I
I I l
j

Say that the test is to be carried out at the 5% significance level. The critical value, x2 %(9) is
I: Ii
l l I I I I :I
2 l '
I I I I I I I
found from tables (see page 651). 5
I I l I I I I I
0
0 2 3 4 5 6 7 8 9
,t;

From ta bles, X2 s% (9) = 16.919

lCY-
1
2
5. Compare the calculated value of X with the critical value. Make your conclusion (H is
SoH will be rejected if X > 1 6 · 919 · . rejected or H 0 is not rejected) and relate it to the context of the situation being 0
investigated.
o 1 of the test statistic, 9.4, IS less
Since the calculated va uel. . th critical region and Ho 2
than 16.919, it does not Ie m e 9.4 16.919
Note that when the value of X is very small, it is wise to query the reliability of the observed
f--:::-~
is not rejected. . d would accept that the
critical region data. This is where the lower tail (left-hand) probabilities might be useful.
On the evidence obtame. ,_rou
digits are true random digits. For example, suppose that the test involves a x2 ( 4) distribution and that the calculated value
of the test statistic is X 2 = 0. 7.

You can see from the tables on page 651 that x2 95 %(4) = 0.711, which means that if the null
.
Summary of the Procedure for performmg a X
z goodness-of-fit test: hypothesis is true you would expect a value less than 0. 711 from at most 5% of samples, so
this would be quite rare. You might wonder whether the observed data have been fiddled.
. h bserved frequencies 0: d h
For a set of data Wit o d" "b ted in a particular way an t e
Make the null hypothesis Ho that the data are tstn u
1. h . H that they are not. TEST 1: GOODNESS-OF-FIT TEST FOR A UNIFORM DISTRIBUTION
alternative hypot ests ' . "b . f !lows the one given in Ho· Note
. t d if the distn utton o f xz .. Example 12.1
2. Calculate E, the frequeXn~tes e~f:~l:es of E tend to give a large valulde o b '~~e~ IS
that when calculating ' sm e uencies below 5 shou not . e .
d . ble to adopt the rule that expected fr q f a class that is sufficiently large. The table shows the number of employees absent for just one day during a particular period
of time.
a VIsa b" dj. acent classes to orm k . d table
If E < 5 for any class, com me a . . h b erved data also and rna e a revise .
Combine corresponding frequencies m t e o s Day of the week Man Tues Wed Thurs Fri
3. Work out t h e num ber of degrees of freedom, v, where
. . Number of absentees 121 87 87 91
f classes - number of restncttons 114
Total500
b
v = num er o . · 1 ] e in tables. (a) Find the frequencies expected according to the hypothesis that the number of absentees is
h 1 1 f the test an d Iook u P the appropriate cnttca va u
Decide on~ e f eve
For examp e, or a
;%
significance level, look up Xz sdv).
~x2(v)
independent of the day of the week.

(b) Test at the 5% level whether the differences in the observed and expected data are
Use it to state the rejection criteria: .. significant.
If xz > xl 5%(v) then the test value lies in the cntical 5%
Solution 12.1
(rejection) regionb the observed and expected x',%
J!~:~~~~:~~~~~ns::::~ to be too great and Ho is e<ltlcal region H 0 : The number of absentees is independent of the day ofthe week.
H 1 : The number of absentees is not independent of the day of the week.
IfXz <xzs%(v) the test value does not rIe m
rejected. . the l~x'(v) 2 5% If the number of absentees is independent of the day of the week then you
critical region and Ho IS not rejected. would expect the total of 500 to be spread uniformly throughout the week.
Expected number of absentees for any day is 100.

2 X 5% Man Tues Wed Thurs Fri


4 Now calculate X
2
= L ( O - E)
E
critical region
· e Observed 121 87
fu~~
87 91 114
. . frequencies (0) :E 0 =500
. v = 1, tt
Note, however, that If . IS
. a dvisable to use Yates' continuity correctwn.
Expected 100 100
the formula is 100 100 100
frequencies (E) :E E = 500
2
(10- EJ-0.5)
xz=I E Degrees of freedom v.
There are five classes and there is one restrictions (I;E = 100).
Therefore v =5-1= 4, so consider the x2 (4) distribution.
568

cL .'d;\iC tiw lc.tc! Perform the test at the 5% level. Perform the test at the 1% level.
ri1c· \L·_q :wd From tables x'
s%(4) ~ 9.488, so reject H 0 if X > 9.488.
2
x
From tables 2 (2) _ 9 2 10 .
1% - • , so reJect H 0 if X'> 9.210.

(O-E) 2
0 E 0 (O-E) 2
X'~ L(O-E)
E 2
E x2(4) E 5.88
121 100 4.41 24 E
30 1.2
87 100 1.69 5% 14 20 1.8
87 100 1.69 62 50 2.88
91 100 0.81 9.488"'
test value L 0-100 LE ~ 100
114 100 1.96 5.88
10.56

L o~5oo LE ~5oo 10.56 2


Since X < 9.210, do not reject H 9.210
The differences between tbe obse~ed
''!(\_'(USJ(l!),
d
X'~ L(O-E)
2
10.56 significant at tbe 1% level Th I an expected frequencies are not
. e co ours are tn the ratio 3 : 2: 5.
E
6 . .tvbi<c yotn Since X 2 > 9.488, reject H 0 • There is evidence, at the 5% level, that the number
c'(-'1 c'Ltslun. of absentees on a day is not independent of the day of the week.
1 (u11 a
Note that the test does not indicate what the relationship might be. The observed frequencies,
however, suggest a tendency towards a greater number of absentees on Mondays and Fridays. 1. A tetrahedral ~ie i~ thrown 120 times and the
number on whtch It lands is noted. 5. It is t~ough~ that each of the 8 outcomes of an
exper~ment ~s equally likely to occur. When the
Number 1 2 ;xpenm~nt ts performed 400 times, the observed
3 4 requenCies are 45, 42, 55, 53, 40, 62, 47 and
TEST 2: GOODNESS-OF-FIT TEST FOR A DISTRIBUTION IN A GIVEN Frequency 35 32 25 28 Total120 ;:· Peffd~rm af tehst at the 1% level to investigate
eva I tty o t e theory.
RATIO Test, at the 5% level whether the die is fair.
6. Ih_ a. particul~r subject students are set multiple
c otce ~uestwns each of which contain five
Example 12.2 2. Fd:o.m_ a list of 500 digits, the occurrence of each alternatives
h h A' B' c ' D and E· A teacher suggests
1g1t ts noted.
t at w en students do not know the correct
According to a particular genetic theory the number of colour strains, pink, white and blue, in Digit inswer they are twice as likely to choose one of
0 2 3 4 5 6 7 8 9
a certain flower should appear in the ratio 3:2: 5. In 100 randomly chosen plants, the 'C ?r D than to choose A or E. For 160
corresponding numbers of each colour were 24, 14 and 62. Test at the 1% level whether the Frequency 40 58 49 53 38 56 61 53 60 32
questwns w?ere it was known that the student
answered Without knowing the correct answer
differences between the observed and expected frequencies are significant. ~· B, C, D, E were chosen 23 45 36 43 d 1•3
Test, at the 1% level, whether the sequence is a · 1y. Is t here evidence,
times r espectrve ' ' at ' thean5%
random sample from a uniform distribution. level, to support the te_acher's theory?
Solution 12.2
3. The o_utcomes, A, B and C, of a certain 7. For a giv:en set of data the observed and ex ected
L -"t"k !--/ 0 <mel H 0 : The colours are in the ratio 3 : 2 : 5. expenment are thought to occur in the ratio frequencies are shown: P
!-!,.. H 1 : The colours are not in the ratio 3 : 2 : 5. 1:2: 1. The experiment is performed 200 times
and the observed frequencies of A, B and Care Result 1 2 3 4
According to tbe null hypothesis, the colours should be in the ratio 3 : 2 : 5, so 3~, 115 and 49 respectively. Is the difference in 5
the expected frequencies are t e observed and expected results significant:> Observed frequency 30 31
Test at the 5% level. · 42 40 57
pink fo xlOO ~ 30 white }ox 100~20 blue j~X 100~50 Expected frequency 38 45 36 36
4. According ~o genetic theory the number of 45
grcti.C!' dun 5, White Blue colo~r strams, red, yellow, blue and white, in a Are the difference~ between the observed and
Colour Pink certam flower should appear in proportions expected frequencies significant at the 1% level?
24 14 62 I: 0 ~ 100 ~/ 12:5:4. Observed frequencies of red yellow
Observed frequency (0) ue and white strains amongst 800 plan;s were ' 8. ~he random variable y has a X2 distribution with
L E ~ 100 1
d~O, 410, 150, 130 respectively. Are these
Expected frequency (E) 30 20 50 eight degrees of freedom. Find y such that
.I f~r~nces from the expected frequencies P(Y> y) ~ 0.05. ' (L)
sigmfJcant at the 5% level? If the number of
L \\/url<. u11t ;-·_ Degrees of freedom v plants h~d been 1600 and the observed
There are three classes and there is one restriction (I: E ~ 100). f~equenctes 220, 820, 300, 260, would the
x
Therefore v ~ 3 - 1 ~ 2, so consider the 2 (2) distribution. dtfference have been significant at the 5% level?
(C)
570

9. During the course of one ydearha tutord:Jr:~j the 11. The grades in a statistics examination for a
111 assignments. The gra es e awar . . group of students were as follows. TEST 3: GOODNESS-OF-FIT TEST FOR A BINOMIAL DISTRIBUTION
comparable national proportions are gtven m the
table: Grade A B C D E
Example 12.3
Grade A B c D Number of students 14 18 32 20 16
A farmer kept a record of the number of heifer calves born to each cow during the first five
Number he awarded 86 18 6 1 Test the hypothesis that the distributi~n ~~
years of breeding of the cow. The results are summarised in the table:
grad es 1.s um.1orm. Use a 5% level of stgmftcance.
(L)
National proportion 71% 16% 7% 6% Number of heifers 0 1 2 3 4 5
Calculate the expected numbers (to ~ne decimal 12. An ordinary die is thrown 120 times an~ each Number of cows 4 19 41
place) based on the national pr~portwns. time the number on the uppermost facets noted. 52 26 8
The X2 goodness of fit test reqmres the The results are as follows: (a)
summation of terms of the form Test, at the 5% level of significance, whether or not the binomial distribution with
Number on die 1 2 3 4 5 6 parameters n ~ 5, p ~ 0.5, is an adequate model for these data.
(0 -E) 2 (b)
Explain briefly what changes you would make in your analysis if you were testing whether
or not the binomial distribution with n ~ 5 and unspecified p fitted the data.
E Frequency 14 16 24 22 24 20
(AEB)
where 0 and E are observed and expected
frequencies. Suggest reasons why Is the die fair? Test at the 10% level.
Solution 12.3
(a) the difference ~etween 0 and E is used 13. In a certain town an investigation w~s carried
(b) this difference ts squar~d, ~~d .t. s:~;;. ,r;_, .u1d
out into accidents in the home to children under (a)
(c) the squared difference ts dtvtded by E. 12 years of age. The numbers of reported if.'. Let X be the number of heifer calves born to a cow in the first five years of
breeding.
Test at the 5% level, whether there is an~ accidents and the ages of the children concerned
diffe'rence between the tutor's and the r.'-atwnal are summarised in the table. H 0 : X- B(5, 0.5)
awarding of grades. State your conclusiOns
clearly. (0) Group Age of child (yrs) No. of accidents H 1: X is not distributed in this way.
10. A calibrated instrument is used ~ver ~~ide ran~e A Oto<2 42 To calculate the binomial probabilities, use cumulative probability tables
of values. To assess the operato.r s ab.tl~ty. tor~~ B 2 to<4 52 which give P(X,; x).
the instrument accurately, the fmal dtgtt mea c 4to<6 28
of 700 readings was noted. The results are Alternatively, calculate the probabilities using
tabulated below. D 6 to< 8 20
P(X =x) = Cx(0.5) 5 -x(0.5)x ~ 5 Cx(0.5) 5
5
E 8 to< 10 18
Final digit Frequency F !Oto<12 16
The total number of cows is 150, so the expected frequencies are found by
0 75 (a) State the modal class.
multiplying P(X = x) by 150.
1 63 (b) Calculate to the nearest month, the mean
2 50 age and the standard deviation of the Note on accuracy: When calculating it is often necessary to approximate, say to
3 58 distribution of ages. d the nearest integer or to one decimal place. If you have memory facilities on
(c) Draw a cumulative frequency curve, an h your calculator for retaining several numbers then you may prefer to do so.
4 73 from it estimate, to the nea_rest month, t :
5 95 median and the interquarttle range for th Using tables:
6 96 ages of 'all children under 12 yea~·s of a~e (Extract from page 645)
concerned in reported accidents m the ?dmc. E ~ 150 x P(X ~ x)
7 63 . a reason, w h ether you cons! er X-B(5, 0.5)
State, givmg .
8 46 the mean the mode or the medta~dbest . P(X ~ 0) ~ 0.0313

~
n~5 P(X <; r)
9 81 represents' t he average age for acct ents m
of age. P(X ~ 1) ~ 0.1875-0.0313 ~ 0.1562
the home to children under 12 fears . the P(X ~ 2) ~ 0.5-0.1875 ~ 0.3125 r~o 0.0313
Use an approximate X2 st~tis.tic to test whet~er (d) An investigator believes that_chtldrenhm 46.9
1
there is any evidence of btas m the op~ra~o: s groups A, B, C , D' E• F areh hkely . to ave P(X ~ 3) = 0.8125-0.5 ~ 0.3125 0.1875
46.9
reading of the instrument. Use a 5% .stgmftcance accidents in the home in t e rattos 2 test P(X ~ 4) ~ 0.9688-0.8125 ~ 0.1563 2 0.5000

~
level and state your null and alternative 2:2: 1 : 1 : 1: 1 respectively. Us.e: ;hether P(X ~ 5) ~ 1-0.9688 ~ 0.0312 3 0.8125
hypotheses. (L) at a 5% significance le':"e.l to dectd (L)
or not this belief is justtfted. 4 0.9688
5 1.0000
Check on size of expected frequencies:
Since the expected frequencies for the first and last classes are less than 5,
combine them with the next classes,
! ( \/

Do a revised table for tbe expected frequencies and also show the corresponding
observed frequencies: TEST 4: GOODNESS OF FIT FOR A POISSON DISTRIBUTION
Number of heifers x 0 or 1 2 3 4 or 5 Example 12.4

Observed frequencies (0) 23 41 52 34 LO 150


An analysis if the number of goals scored by the local football team gave the following results:
Expected frequencies (E) 28.1 46.9 46.9 28.1 LE~ 150
Goals per match (x) 0 1 2 3 4 5 6 7
Number of matches 14 18 29 18 10 7 3 1 TotallOO
2
Carry out a xgoodness of fit test at the 10% significance level to determine whetber or not
the above distribution can be reasonably modelled by a Poisson distribution witb parameter 2.

Solution 12.4

Let X be the number of goals scored in a match.


(0 E)z 2
0 E 2 L(O-E)
3.46 (2 d.p.) H 0 : X- Po(2)
E x~ E
HI: X is not distributed in this way.
23 28.1 0.925 ...
x2(3)
41 46.9 0.742 ... To calculate the Poisson probabilities, use cumulative probability tables which
give P(X.; x).
52 46.9 0.554 ... "'"·"·~~5%
34

L0-150
28.1

LE~ 150
1.238

3.461 ...
...
I j(
1',
7.815
Alternatively, calculate tbe probabilities using

P(X~x)~e- 2 -.
lx
x!
Since xz < 7.815, do n?t rej~ct Ho. d p- 0 5 is an adequate model for the
;··)J:d.IS!')rl
The binomial distributiOn wtth n ~ 5 an - · The total number of matches is 100, so the expected frequencies are found by
multiplying P(X ~ x) by 100.
data. d 1 "th p
. .b . B(5 p) provides an adequate rna e' wt Using tables:
(b) If you want to test whether the distn utw~ 'the data using the fact that the mean of a (Extract from page 64 7)
unspecified, you would need to esttmate p rom
X- Po(2)
binomial distributiOn IS np. E ~ 100P(X ~ x)
4 ~2.0 P(X <; x)
P(X ~ 0) ~ 0.1353
From the data 13.53
P(X ~ 1) ~ 0.4060- 0.1353 ~ 0.2707 r~o
0.1353
27.07
x~ I:fx ~ 401 ~2.673 ... P(X ~ 2) ~ 0.6767- 0.4060 ~ 0.2707
27.07
1 0.4060
:E f 105 P(X ~ 3) ~ 0.8571-0.6767 ~ 0.1804 2 0.6767
18.04
Since P(X ~ 4) ~ 0.9473-0.8571 ~ 0.0902 3 0.8571
9.02
P(X ~ 5) ~ 0.9834-0.9473 ~ 0.0361 4 0.9473
x~np '3:61
P(X ~ 6) ~ 0.9955- 0.9834 ~ 0.0121 5 0.9834
2.673 ~ 5p 1.21
P(X ~ 7 or more)~ 1- 0.9955 ~ 0.0045 6 0.9955
0.45
p- 0 535 (3 d.p.) · auld 7 0.9989
. hypothesis would be Ho: X- B(5, 0· 535) and tbe expected frequencieS w 8 0.9998
So the -null
Check on size of expected frequencies: 9 1.0000
be calculated using P ~ 0.535. ld t k into account that 2
The x test is not valid for expected frequencies less than 5, 10
When working out v, the number o. ';'e~Eo~ t~o (as before) and the other is that P "
fd f f edam you wou a e .
so combine the last three classes to give 5 or more goals. 11
there are now two restnctwns, one IS t at
Revised table:
estimated from the sample. . - 3 xz ~ 0.85 (dependmg
h h. You should frnd that v ~5- 2 - ' X 0 1
Try workrng tbroug t IS test. 1 l t s) and H not rejected. 2 3 4
on degree of approximatiOn m your ca cu a Ion o ~ ~ =~--- 5 or more
0 14 18 29 18 10 11
E 13.53 27.07 27.07 18.04 9.02 5.27
TEST 575

Degrees of freedom v Using the formula:


There are six classes and there is one restriction (~E ~ 100).
Therefore v ~ 6- 1 ~ 5, so consider the x2 (5) distribution. P(X ~ x) to 4 d.p.
E 100P(X x)
4. State the icvc! Perform the test at the 10% level.
P(X 0) 0.10025 ...
nf dw lest cmd From tables x210 %(5) ~ 9.236, so reject H 0 if X 2 > 9.236 10.03
P(X ~ 1) ~ 0.2306
the rcjccti()n 23.06
P(X ~ 2) ~ 0.2652
crnt:l"JOtl. 26.52
P(X ~ 3) ~ 0.2033
20.33
(0-E)' P(X~4)~0.1169
0 E
E X'~ I(O-E)2 9.53 (2 d.p.) P(X ~ 5) ~ 0.0538
11.69
E 5.38
14 13.53 0.016 ... P(X ~ 6) ~ 0.0206 riw iasl tilrc•c das,cs iil"l'
2.06
18 27.07 3.038 ... P(X ~ 7 or more)~ 0.0099 curnhincd to givz· c1 cl:u;_, with
x2(5) 0.99
29 27.07 0.137 ...
18 18.04 0.000 ... LE ~ 100

~
Revised table:
10 9.02 0.106 ...
11 5.27 6.230 ... X 0 1 2 3
9.236 4 5 or more
LO ~ 100 LE ~ 100 9.529 ... 0 14 18 29 18 10 11
6. rVbb~ you1 Since X 2 > 9 .236, reject H 0 • E 10.03 LO 100
23.06 26.52 20.33
condtls\On. The number of goals per match cannot be modelled by a Poisson distribution 11.69 8.43
Degrees of freedom v:
with parameter 2.
There are six classes.
There are two restrictions:
Example 12.5 ~E~ 100
Can the data of Example 12.4 be modelled by a Poisson distribution having the same mean as Ththefmean,of the Poisson distribution has to be estimated from the data
the observed data? Test at the 10% level. ere ore v ~ 6-2 ~ 4, so consider the x'(4) dt' t 'b . .
s n utwn
Perform the test at the 10% level. .
Solution 12.5 From tables xz (4) 7 7
IO% ~ • 79, so reject Ho if xz > 7. 779 .
~fx 230 Calculating xz
For the observed data, x~-f ~--~ 2.3.
~ 100
0 E (0-E) 2
The null hypothesis is that the distribution is Poisson, with parameter 2.3,
E xz~I(O-EJz
t.e. H 0 : X- Po(2.3) 14 10.03 4.208 (3 d.p.)
1.571 ... E
H 1 : X is not distributed in this way. 18 23.06 1.110 ...
29 26.52
The probabilities are found from cumulative tables or by calculating using 0.231 ... x2 (4)
18 20.33
3 0.267 ...
' - -
l(X-x)-e -2.3 (2. 1' -
--,x-0,1,2, ... 10 11.69 0.244 ... .~.~
x! 11 8.43 0.783 ...
The expected frequencies are given by 100 x P(X ~ x). LO 100 LE ~ 100 4.208 ... 7.779
Since xz 7 77
The < . 9, do not reject Ho.
number of goals per match can b d 1
as the observed data e mo e led by a Poisson distribution with th
· e same mean
4. -''1-\Tc· th,· Jc-,.
TEST 5: GOODNESS-OF-FIT TEST FOR A NORMAL DISTRIBUTION Perform the test at the 5% level
;b,- kst :tnd
From tables X2S% (4) = 9 ·4 88, so .reJect
. H 0 if xz > 9.43 8
-h· r·<.'ICd!CI!I

Ci'llt.'l'iUtL
Example 12.6
The height, in centimetres, gained by a conifer in its first year of planting is denoted by the
random variable X. The value of X is measured for a random sample of 86 conifers and the 0 (0-Ef 2
E
E
x2 _ "(O-E)
results obtained are summarised in the table: - L, E 2.61 (2 d.p.)
10 13.7 0.999 .. .
<35 35-45 45-55 55-65 >65 18 18.1 0.0005 .. .
X
28 22.4 1.4
10 18 28 18 12
Observed frequency 18 18.1 0.0005 ...
12 13.7
(a) Assuming that X is modelled by a N(50, 15 2 ) distribution, calculate the expected
frequencies for each of the five classes.
(b) Carry out a x2 goodness of fit analysis to test, at the 5% level, the hypothesis that X can 6. :\'l:tl<c yot't
.EO- 86
s·mce X 2 < 9.488, do not reject Ho.
0.210
2.611 ... -
5%

9.488

be modelled as in (a). (C) cu;t.: lusiou.


The normal model N(SO, 152) is a suitable model.

Solution 12.6
Example 12.7
(a) X- N(50, 15 2 )
Standardise each X value, A weaving mill sells lengths of cloth with a n . II
lengths and obtained the following fre odmmab ength of 70 m. A customer measured 100
e.g. when x = 35 quency Istn utwn:
X -f.' 35-50
z=-- -1 Length (m) 61-67 67_ 69 69 71 71 73
a 15 73 75 75 81
x, 35 45 55 65
Frequency 1 16
-1-0.333 0.333 1 26 19 20 18
Notice that there is symmetry in the diagram. z,
E =probability x 86 Use a X2 test at the 5% level of significance to sh . .
Probabilities adequate model for the data. ow that the normal dtstnbution is not an
P(X < 35) = P(Z < -1) = 1-0.8413 = 0.1587 13.7 (AEB)
P(35 <X< 45) = P(-1 < Z < -0.333) = 0.8413-0.6304 = 0.2109 18.1
Solution 12.7
P(45 <X< 55)= P(-0.333 < Z < 0.333) = 2 x 0.6304-1 = 0.2608 22.4
P(55 <X< 65) = 0.2109 (by symmetry) 18.1 The null hypothesis is that the distribution is normal . .
vanance is given they have to b t. df ' but smce neither the mean nor the
(P > 65) = 0.1587 (by symmetry) 13.7 ' e es tmate rom the data. .
.EE = 86 Mid-interval value x 64 68 70 72. .74 78
Note that the expected frequencies have been given to 1 d.p. Frequency 1 16 26 19 20 18
(b) x' goodness-of-fit test: From the calculator
:L Sr:tl-c }(,and H 0 : X- N(SO, 15 2 ) /l = 5i = 72.24 (see page 32)
H 1: X is not distributed in this way.
i-i,.. a2 = 11.5.78 (3 d.p.) (see page 449)
X <35 35-45 45-55 55-65 >65 2
X goodness-of-fit test
I. St~'- 1v !-_! 0
28 18 12 .EO= 86
,-1:-; (I
H 0 : X--: N(72.24, 11.5.78)
e}:pcctvd 0 10 18 1-il.
frt.'.(1l1CllC'iC·~ :=tn
H,: X ts not distributed in this way.
E 13.7 18.1 22.4 18.1
g1·c:-1t(.:r !h:1J1 5. Standardise the boundary values of the
(Note that all expected frequencies are greater than 5 so there is no need to l'_\i'<..'V!t'(l intervals (to 3 d.p.) using z _ x- f.l x- 72.24
combine classes) a >111.578
Degrees of freedom v when x = 61, z = 61- 72.24
There are five classes and one restriction (I:E = 86). >/11.5.78 -3.303.
X: 61 6769717375 81
Therefore v = 5-1= 4, so consider the x2 (4) distribution. NOTE: P(X < 61) = P(Z < -3 ·303)..., 0 ' so take the first class as X< 67.
E ~ prob x 100
Probabilities _
6.18
P(X < 67) ~ P(Z < -1. ) ~ - ~~!~~)-~ ~.
0 0618
2 _ 0.8294 ~ 0.1088
938
540 1 Summary of the number of degrees of freedom for goodness-of-fit
10.88 tests
P(67 <X< 69) ~ P(-1.540 < z < 0. 364)- 0 8294- 0.6421 ~ 0.1873 18.73
) -P(-0 952<Z<- · - ·
P(69 <X< 71 - . 0 223)- 0 6421 + 0.5883-1 ~ 0.2254 22.54 Distribution
P( 71 <X< 73) ~ P( -0.364 < Z < · - ·
v
P(73 <X< 75) ~ P(0· 223 <
z < 0 811) ~ 0.7913-0.5883 ~ 0.208
. - 0 2037
20.8
Uniform
P(75 <X< 81) ~ P(0.811 < z < 2.574) ~ 0.995-0.7913- .
P(X> 81) ~ P(Z > 2.574) ~ 1-0.995 ~ 0.005 E:J Given ratio

Binomial (a) if pis known


v=n-1
v=n-1
v=n-1
c:omb!iw The: l:ht nvu du~o;cs to gi'.rc X 7S. (b) if pis unknown and it is estimated from the
v=n-2
observed frequencies using X= np
X <67 67 69 69 71 71 73 73-75 75 and over
Poisson (a) if;[ is known
1 16 26 19 20 18 :EO 100 v=n-1
0 (b) if A is unknown and it is estimated from the
10.88 18.73 22.54 20.8 21.42 :EE ~ 100 v=n-2
E 6.18 observed frequencies using X=A
Degrees of freedom v Normal (a) if fl and a 2 are known
There are six classes. 2
v=n-1
(b) if p and a are unknown and are estimated
There are three restrictions: v=n-3
from the observed frequencies
":EE~100 d
" The mean of the normal distribution has been estimated from the ata.
" The variance of the normal distribution has been estimated from the data. 12b x 2 goodness-of-fit tests binomial ' Poisson and normal
- 6-3 ~ 3 ' so consider the x'(3) distribution.
Th ere fore v- dis butions
4. St<t\e rhe kvd Perform the test at the 5% level. 'eel H if X'> 7.815 1. Perform a x2 test to investigate whether the A new fly spray is applied to 50 samples each of
From tables x's%(3) ~ 7.815, so reJ o following is drawn from a binomial distribution
of the test and five flies and the number of living flies counted
with p = 0.3. Use a 5% level of significance.
the rejection after one hour. The results were as follows:
en tenon. 0 1 2 3 4 5 Number living 0 1 2 3 4 5
(0 E)' 12 39 27 15 4 3 Frequen·cy
X'~ L (0 ~E)' -10.7
0 E 7 20 12 9 1 1
E (1 d.p.)
2. A six-sided die with faces numbered as usual Calculate the mean number of living flies per
1 6.18 4.341 ... from 1 to 6 was thrown five times and the sample and hence an estimate for p, the
16 10.88 2.409 ... number of sixes was recorded. The experiment probability of a fly surviving the spray. Using
x2(3)
18.73 2.821 ... was performed 200 times, with the following your estimate calculate the expected frequencies
26 results: {each correct to one decimal place)
19 22.54 0.555 ... ~~,~,~ corresponding to a binomial distribution and
~- '~t~- perform a x goodness-of-fit test using a 5%
X 2
20 20.8 0.030 ... 0 1 2 3 4 5
)( significance level.
18 21.42 0.546 ... 7.815 Frequency 66 82 40 10 2 0
10.705 ... 4. Two dice were thrown 216 times, and the
:EO -100 :EE ~ 100 On this evidence, would you consider the die to number of sixes at each throw were counted. The
6. lvbke your be biased? Fit a suitable distribution to the data results were:
and test and comment on the goodness of fit.
Since X'> 7.815, rej~ct Ho.
conciusJon.
d uate model for the data. (MEl) No. of sixes 0 1 2
The normal distributton ts not an a eq
3. Under what circumstances would you expect a Frequency 130 76 10 Total216
variable, X, to have a binomial distribution?
What is the mean of X if it has a binomial Test the hypothesis that the distribution is
distribution with parameters nand p? binomial with the parameter p = !.
Explain how the test would be modified if the
hypothesis to be tested is that the distribution is
binomial with the parameter p unknown. (Do
not carry out the test.) (0)
• ! 581

5. Smallwoods Ltd. run a weekly football pools Number landing (a) Find. the correspondi~g expected frequencies
competition. One part of this involves a fixed- obtamed from the Pmsson distribution with
point down Frequency Number of Number of stations (f)
odds contest where the entrant has to forecast parameter 1.40.
rainstorms (x)
correctly the result of each of five given matches.
3 2 (b) ~ar~~ out a x2 test, at the 5% level of reporting x rainstorms
In the event of a fully correct forecast the entrant sigmficance, to determine whether or not the
is paid out at odds of 100 to 1. During the last 4 2 0 102
proposed model should be accepted. State
two years Miss Fortune has entered this fixed- 5 5 1
cl~arly the null and alternative hypotheses 1!4
odds contest 80 times. The table below 6 7 bemg tested and the conclusion which is 2 74
summarises her results. reached. (MEI) 3
7 17 28
8 8 4
Number of matches Number of entries 9. Th~ numbers of cars passing a check-point 10
correctly forecast with x correct 9 6 dunng 100 intervals, each of time 5 minutes 5 2
10 1 were noted: ' more than 5
per entry (x) forecasts (f) 0
11 2
0 8 Number of cars Frequency (a) F:nd the expect~d frequencies of rainstorms
1 19 (a) Calculate the mean number landing point given by the Poisson distribution having the
0 5 s~m~ me~n and total as the observed
2 25 down. Hence show that an estimate for the
probability of a drawing pin landing point 1 23 d1stnbutton.
3 22 2 (b) Use the x.l distribution to test the adequacy
down is 0.35. 23
4 5 (b) What are the parameters of the appropriate 3 of the Poisson distribution as a model for
25 these data. (AEB)
5 1 binomial distribution for these data? 4
Calculate the probability of exactly eight 14
5 13. Ov~r a period of 50 weeks the numbers of road
(a) Find the frequencies of the number of landing point down, and hence write down, 10
matches correctly forecast per entry given by accurate to one decimal place, its expected 6 or more ~cchtdents reported to a police station are shown
0 m t e table below.
a binomial distribution having the same frequency.
mean and total as the observed distribution. (c) Using appropriate tables, find, making your Fit a Poisson distribution to these data and te t No. of accidents 0 1
(b) Use the x2 distribution and a 10% level of method clear, the expected number of times the goodness of fit. s 2 3
significance to test the adequacy of the five or fewer pins would land point down. No. of weeks 23 13 10 4
binomial distribution as a model for these (d) The chi-squared goodness-of-fit test can be 10. Durin? the weaving of cloth the thread
data. used to judge how well data follow a sometimes breaks. 147lengths of thread of eq 1
Find t~e mean number of accidents per week.
(c) On the evidence before you, and assuming distribution. Group the above data in the length were observed during weaving and the ua
Use thts mean, a 5% level of significance and
that the point of entering is to win money, following manner and evaluate the missing tab!e recm~ds ~he number of these threads for
~our table of xl to test the hypothesis th~t these
would you advise Miss Fortune to continue expected or observed frequencies: which the mdicated number of breaks occurred.
~ta are a. rando~ s_ample from a population
with this competition and why? (AEB) Wtth a Pmsson dtstnbution. (O&C)
Number of pins <5 6 7 8 Number of breaks
6. Samples of size 5 are selected regularly from a per thread 0 1 2 14. (a) The dat~ in the .follo~ing table are the result
Expected 8.6 11.8 3 4 5
production line and tested. During one week 500
samples are taken and the number of defective Number of threads 48 46
?f countmg radiOactiVe events in five-second
30 12 9 2 mtervals:
items in each sample is recorded. Observed 9 7 17
Fit a ~oisson distribution to the data and Number of events
Calculate the value of the chi-squared statistic 0 1 2 ~3
Number of examme ~hether the deviation between theory
for this data. and expenment is significant. (MEI)
defectives, x 0 1 2 3 4 5 Number of observations 5 14 13 8
(e) How many degrees of freedom does your
Frequency,{ 170 180 120 20 8 2 test have? By referring to your tables carry 11. A shop that repairs television sets keeps a record
out the test and make your findings clear. ~how that the mean number of events in a
(0) df the n~mber of sets brou~ht in for repair each fr:e-second interval is 1. 7 (taking the group
(a) It is suggested that a binomial model, with ay. Th.~ numbers brought m during a random Wit~ frequency 8 to have a mean of 3.5).
sample of 40 days were as follows.
mean the same as the observed data, can be
8. A local council has records of the number of (b) Wnte down th~ probability of 0, 1, 2, >3
used. Find the frequencies expected by this 4000211000 0110300010 events for a Pots_son distribution with mean
children and the number of households in its
model. 4000002010 0001110200 1. 7. Hence obtam to one decimal place th
area. It is therefore known that the average
(b) Test whether t:•,is binomial model is a good expected frequencies. e
one. Use a 5% level of significance.
number of children per household is 1.40. It is ~est, at the 5% significance level, the hypothesis (c) Usc the chi-squared goodness of fit test to
suggested that the number of children per ~ ~t these. nu~nbers are observations from a
household can be modelled by the Poisson Olsson d1stnbution. (C) assess whether it is reasonable to claim that
7. A group of students are performing an
distribution with parameter 1.40. In order ~o test the data come from a Poisson distribution.
experiment where 20 drawing pins arc dropped 12. Make your method clear and conduct your
randomly on to the floor and the number landing
this, a random sample of 1000 households IS ~he tabl~ gives the distribution for the number of test at the 10% level.
taken, giving the following data. ea;y ra!nstorms reported by 330 weather
point down is counted. The procedure is then (d) A student co~ducting a similar experiment
repeated several times. Describe the assumptions statiOns m t~e United States of America over a
Number of one-year penod. found the chi-squared statistic for his results
you would need to make in order to be satisfied was 0.0.15. What conclusions do you draw
with modelling this situation by a binomial 1 . 2 3 4 5+
children 0 from thts value? (O)
distribution. The experiment was carried out
until the students had 50 observations; their Number of
273 361 263 78 21 4
results are given in the table: households
582 .L\ Vif JC:St
T

15. For a period of six months 100 similar hamsters (a) Test, at the 5% level, whether the data
were given a new type of feedstuff. The gains in follow a normal distribution with mean (b) age of voter and voting preference
mass are recorded in the table below: 173.5 em and standard deviation 7 em.
(b) Find the expected frequencies for a normal
Gain in mass (g) x Observed frequency f distribution having the same mean and Candidate
variance as the data given, and test the
-oo<xo;;;;...:1o 3 A B
goodness of fit, using a 5% level of
-10<x<;-5 6 significance. 18-25
.....0 373 62
-5<xo;;;;O 9 17. In a European country registration for military b 25-40 484
~
~ 187
O<x<;S 15 service is compulsory for all eighteen-year-old ~ !? 40-60 167 563
5<x<;10 24 males. All males must report to a barracks Over 60
where, after an inspection some people, including 100 492
10<x<;15 16
all those less than 1.6 m tall, are excused service.
15<x<;20 14 The heights of a sample of 125 eighteen-year- This is a 4 by 2 contmgency table (4 rows and 2 columns).
20 <X <;25 8 aids measured at the barracks were as follows:
25<x<;30 3 _¥ ou can use a X2 test to investigate whether th
30<xo;;;;oo 2 Height, m 1.2- 1.4- 1.6- 1.8- 2.0-2.2 ~ an association between them. The test follo:s tw~ faftors are independent or whether there
hut this time the null hypothesis H is that th t a ~mi ar pattern to the goodness of fit test
Frequency 6 34 31 42 12
It is thought that these data follow a normal ypothesis HI is that there is an ass~ciation bet wo ahctors are independent and the alternative
distribution, with mean 10 and variance 100. e ween t em.
Use the x2 distribution at the 5% level of (a) Use a x2 test and a 5% significance level to The following example explains how to 1
significance to test this hypothesis. confirm that the normal distribution is not contingency table ca culate the expected frequencies for data . .
Describe briefly how you would modify this test an adequate model for this data. · given m a
if the mean and variance were unknown. (AEB) (b) Show that, if the second and third classes
(1.4- and 1.6-) are combined, the normal
Example 12.8
16. The following data give the heights in distribution does appear to fit the data.
centimetres of 100 male students. Comment on this apparent contradiction in
the light of the information at the beginning The members of a sports team are interested in h
Height (em) Frequency of the question. (AEB) results. They play 50 matches with the fall . w ethelr the weather has an effect on their
' owmg resu ts
155-160 5
161-166 17 Weather
167-172 38 Good Bad Total
173-178 25 ~ Win 12
"3 4 16
179-184 9 ~ Draw 5
~
~
8 13
185-190 6 Lose 7 14 21
Total 24 26 so
THE x 2
SIGNIFICANCE TEST FOR INDEPENDENCE Formulate suitable null and alternative h oth
1% significance level, that the weather h YP e;;s• and use a X2 test to test the claim at the
Sometimes situations arise when data are classified according to two different factors or conclusion clearly. as no e ect on the team's results. State you/
attributes and these are often displayed in a table, known as a contingency table, for example
(C)
(a) examination grades for Mathematics in three further education colleges Solution 12.8
College Note that the factors are the result of the match
lmked in a 3 by 2 contingency table. and the type of weather and they have been
Bradley Cooper Dunstan
_!,Stale' u., :t!h!

A 27 35 17 The hypotheses are:


= B 52 36 28
·~ c 63 31 64
Ho: The weather has no effect on the team'
H·Th h
1
sresuts
e weat er has an effect on the team' s resu1ts
·§ ~
D 31 43 21
2.
L

.tl ""'l5
~ ~
;\nc] chn i; r ]-. 1( When calculating the expected fre .
E 16 17 12 remain the same. quencies, the row and column totals must
l'XilCCl:'d
N 5 12 8
"'!'"'""'''' ;;J_'(

This is a 6 by 3 contingency table (6 rows and 3 columns). gn::ner ti ,,, 1 s


584

Consider the cell linking a win with good weather: Note that all the expected frequencies are greater than five, so cells do not need
to be combined.
Total number of Weather
Total Degrees of freedom, v
wins~ 16, therefore Good Bad
16 Notice that in this table once two of the expected frequencies in different rows
16
P(result is a win)~ so· "il
Win
D ''Tc\' :()(::[
have been calculated (for example those in bold type), the others are known
~ Draw automatically. This is because the row and column totals must agree with those
Total number of matches ~ Lose in the observed data, for example
in good weather~ 24, if expected number of wins in good weather~ 7.68,
50
therefore
Total 24 then expected number of wins in bad weather~ 16- 7.68 = 8.32
~i'
24 Number of degrees of freedom, v ~ 2 and the x2 (2) distribution is considered.
P(good weather)~ 50 . ·ole'""'""''
Test at the 1% level.
. h events 'the result is a win' and 'the weather From tables x 1 %(2) ~ 9.21, so reject H 0 if X 2 > 9.21.
2
According to the null hypothesis, t ~ lti lication rule for independent events
is good' are independent, so, usmg t emu p
(see page 198)
p (win and good weather)~ P(win) x P(good weather)
X 2 ~"'
(0-Ejl 2
0 E
16 24
~-x- E L.. (O-
E E) 6. 96 (2 d.p. )
50 50 12 7.68
16 24 2.43
Expected number of wins in good weather ~.56' x .56' x 50 5 6.24 0.246 ...
7 10.08 0.941 ...
16x24
4 8.32 2.243 ...
50 8 6.76 0.227 ...
~7.68
14 10.92 0.868 ...
· 16 x 24 g1vesacu
. I e to the quick way of working out
Note that theca IcuIatlon 50 ro ~so :EE ~50 6.956 ...
9.21
the expected frequency: 2
H)\.Y total>< colurnn tou1 Since X < 9.21 do not reject H 0 , and conclude that the team's results are
independent of the weather.
tmal
At the 1% level conclude that the weather .has no effect on the team's result.
So, for example, the expected numb er of draws in bad weather is calculated as
follows:

Weather Expected frequency


Bad Total
Finding the number of degrees of freedom, v, in an h by k contingency
Good row total x column total table
Win grand total
-a Draw
D
13 13 x26 There is a general rule for calculating v for data in a contingency table. In each of the tables
"
0:: Lose 50 shown below, it is possible to work out all the expected frequencies once the values indicated
with a x have been found.
50 ~ 6.76
Total 26
4 by 3 table
The completed table for the expected frequencies is:

Weather
v~(4-1) x (3 -1)
Good Bad Total
~3x2

Win 7.68 8.32 16 ~6

-a Draw 6.24 6.76 13


~ Lose 10.08 10.92 21

Total 24 26 50
T
2 by 4 table Solution 12.9

v~(2-1)x(4-1) Displaying the results in a contingency table:


~1x3

I I I I I ~3 Result of driving test


Pass Fail Total
3 by 2 table Male 28 12 40
Female 34 26 60
v~(3-1)x(2-1)
Totals 62 38 100
~2x1
L SUit' fl,,. ;utcl
~2 The hypotheses are:
fil.

Ho: There ishno relationship between the sex of a candidate and the abil'ty t
pass at t e first attempt. I o
2 by 2 table
Ht: There is a relationship.
V=(2-1) X (2-1) 2, C:;dcd~lk F
To calculate expected frequencies, use
=1x1 ;,t d; h~·~:k t·iwt

I I ~1 c'X[1C:ctn!

''"''(llt'iKir:c;

:,;-c·:tln th:lil
arc·
S.
Expected frequency row total x column total
grand total
In general, if there are h rows, then once (h -1) expected frequencies in a row have been
So expected number of males who pass 40x62
calculated, the last value in the row is known because the row total must agree. 100 = 24.8
Similarly, if there are k columns, once (k- 1) expected frequencies in a column have been
calculated, the last value in the column is known because the column total must agree. ~~; athllethfact that row fand column totals agree with the observed data to work
e remammg requenc1es:
For an h by k
number of (JCizrr'" of freedom (h 1) ;< (/.z 1 ). Result of driving test
Pass Fail Total
Male 24.8 15.2 40
Yates' correction for a 2 by 2 contingency table Female 37.2 22.8f-- 60
Totals /62 38
In particular, for a 2 by 2 contingency table, v = 1 and the x2 (1) distribution is considered. In 100
this case, Yates' correction should be applied when calculating X 2 , where I
,->, ,L

Note that there are no expected frequencies that are less than 5.
l. k OUi /}.
Degrees of freedom, v
v ~ (2 - 1 H2 - 1) = 1, so use the x2 (1) distribution.
Example 12.9
4, ~;l:!i~'
Test at the 5% level.
A driving school examined the results of 100 candidates who took their test for the first time.
From tables X 5 %(1) = 3.841, so reject H 0 if x2 > 3.841.
ct·_llc' lt-'i' 1,[ 2
It was found that out of the 40 men, 28 passed and out of the 60 women, 34 passed. Do these
results indicate, at the So/o significance level, a relationship between the sex of candidate and CJ.'lil:!_-;u;l

the ability to pass the driving test at the first attempt?

I
11'1
T If If- x:' ');Cf-Jii'' 'L!'IC:i IS 589

5. Ani nvestigation into colourblindness and the sex


Using Yates' correction, of a person gave the following results:
10. I? an investigation into eye colour and left- or
nght-handedness the following results were
(IO- El- 0.5) 2
X'= I (IO -E~- 0.5)
2
obtained:
0 E Colour blindness
E
= 1.29 (2 d.p.) Colour blind Not colour blind Handedness
28 24.8 0.293 ...
M ale 36 Left Right
37.2 0.195 ... 964
34 Female 19 981 Eye Blue 15 85
12 15.2 0.479 .•.
colour Brown 20 80
26 22.8 0.319 ... Is the ~e ~vidence, at the 5% level, of an
5% as soc tatwn between the sex of a person and
l:O = 100 l:E = 100 1.289 ... --~·
whether or not they are colourblind? Is the~e ~vidence, at the 5% level, of an
assoctatwn between eye colour and left- or ri ht-
3.841 handedness? g
Since X 2 < 3.841, do not reject H 0 • 6. In as mall survey 35 0 car owners from four
6, J\iLd(<:' ym:r
distric ts P, Q, R, S were found to have cars in
C\)JlC I u sio 1·1.
The driving test results do not indicate a relationship between the sex of a price ran?es A, B, C, D, the frequencies of the
11. In 1988 the number of new cases of insulin-
candidate and the ability to pass the driving test at the first attempt. prices bemg as shown in the table. dependent diabetes in children under the age of
15 years was 1495. The table below breaks
p down this figure according to age and sex.
Q R s
Age (yrs) 0-4 5-9 10-14 Total
Exercise l2c Conti tables A 9 10 12 19
B 13 20 18 29 Boys 205 248 328
Carry out a x2 test to investigate whether there is 781
1. Two schools enter their pupils for a particular
an association between the sex of the respondent
c 24 29 12 25 Girls 182 251 281 714
public examination and the results obtained are D
and the respondent's view of how the work is 34 41 18 37
shown below. Total 387 499 609 1495
shared.
Credit Pass Fail Comment on any differences revealed by this Find the e~pected frequencies on the hypothesis
survey between the opinions of husbands and that there Is no association between the district Perform a suitable test, at the 5% significance
10 19 wives about who does the household cleaning. and the price of the car. Use the x1 distribution ~eve!, to determine whether age and sex are
School A 51
(NEAB) to test this hypothesis. (AEB) mdependent factors. (C)
School B 39 10 21
3. The following are data on 150 chickens, divided 7. A random sample of 100 shoppers was asked by 12. When analysing the results of a 3 x 2
By using an approximate x1 statistic, assess at contingency table it was found that
into two groups according to breed, and into ~ market research team whether or not they used
the 5% level whether or not there is a significant
three groups according to yield of eggs: Sudsey Soap. 58 said yes and 42 said no. In a 6
(0 1 -E.)'
difference between the two schools with respect
to the proportions of pupils in the three grades.
second rando?1 sample of 80 shoppers, 62 said ~ E, ' 2.38.
Yield yes and 18 smd no. By considering a suitable
State your null and alternative hypotheses. (L)
2 x 2 contingency table, test whether these two Write do~~ the number of degrees of freedom
High Medium Low samples are consistent with each other. (0 & C)
2. Students in the Sociology department of a and the cntical value appropriate to these data in
university decided to conduct a survey into the 29 28 order to carry out a X2 test of significance at the
Rhode Island Red 46 8. The table. summarises the incidence of cerebral 5% level. (L)
roles of married couples in performing tasks of
Leghorn 27 14 6 tumours m 141 neurosurgical patients.
housework and child care. They designed
questionnaire for this purpose. 13. In a college, .three different groups of students sit
They then contacted 240 married couples who Are these data consistent with the hypothesis Type of tumour the sa.me ~xamination. The results of the
were willing to take part in the survey. Each of that the yield is not affected by the type of breed? Benign Malignant Others exammatwn are classified as Credit, Pass or Fail.
the participating couples was randomly allocated I~ order to test whether or not there is a
to one of two groups. In the first group the wife 4. A research worker studying the ages of adults 4-t!3 Frontal lobes 23 9 6 drfferen~e between the groups with respect to the
0 0
was asked to complete the questionnaire and in and the number of credit cards they possess 8 E! Temporal lobes 21 4 3 propo~twn of students in the three grades the
the second group the husband was asked to obtained the results shown in the table. V) B Elsewhere 34 statrstrc
24 17
complete it.
Four response categories were available for a Number of cards possessed Fmd the e~pected frequencies on the hypothesis
2: (0 -E) 2

question which asked how the work of cleaning E


~3 >3 t~at there 1s no association between the type and
the house was shared between husband and wife. stt.e of a tumour. Use the x1 distribution to test is evaluated and found to be equal to lO.ZS.
The following table shows the numbers of 74 20 this hypothesis. (AEB)
husbands and wives choosing each category. 35 (a) Explain why there are four degrees of
50
9. In an examina~ion 37 out of 47 boys passed and freedom in this situation.
Response category Husbands Wives 2~ out of 41 gtrls passed. By considering a (b) Using a 5% level of significance, carry out
Use the x2 statistic and a significance test at the the test and state your conclusions. (L}
smtable 2 x 2 contingency table test whether
Wife does it all 21 30 5% level to decide whether or not there is an
boy.s and girls differ in their ability in this
63 58 association between age and number of credit subject.
Wife does most of it
cards possessed. (L)
Shared half and half 28 25
Husband does all or most of it 8 7
I l,c

I
I
15. A market research organisation interviewed ~ Miscellaneous worked examples
14. The personnel manager of a large firm i~ . le of 1 20 users of launderettes m
. . . w hether there is any assocutwn
mvesttgatmg 1 ran d om samp f d b nd X
London and found that 37 pre erre ra d h
between the length of service of t~e emp oyhes h. owder 66 preferred brand y an t e
and the type of training they recctve from t e d
;::ai~~~ prefer'red brand z. A simil~r survey Example 12.10
firm. A random sample of 200 empl?yee re~?r s . d t ·n Birmingham. In this survey,
. k f the last few years and ts was carne ou 1 db dX
1sta en rom f classtfled
· · of 80 people interviewed, 19 prefer~e ran ' In experiments in pea breeding Gregor Mendel obtained the following data relating to 556
according to these criteria. Length o servtce IS peas.
40 referred brand y and the remamder
classified as short (meaning less than 1 year), z
pf db and Test whether these results
. ( _ years) and long (more than. 3 pre erre r · . h 5% level of
med mm 1 3 I .. d b g provide significant evtdence, at t. e. 0 (C) Round and
ears). Type of training is c assJ 1te' as em . . . '
Wrinkled and Round and Wrinkled and
. . . I ,.m d u cn·on course ' proper mtttal
ymerely an mltla different preferences in the two cities.
d Yellow Yellow Green
on-the-job training but little if any more, an Green
regular and continuous training. The data are as 16. The results obtained by 200 stude~ts {; bl 315 101
chemistry and biology are shown m t e ta e. 108 32
follows: Test, at the 5% level, whether the performances
Length of service in both subjects are related. According to Mendel's theoretical results, the expected figures are in the ratios 9: 3: 3: 1.
2
Short Medium Long Calculate the value of x for these data on the assumption that the theory is correct.
Chemistry
23 13 Pass Fail Test at the 10% significance level whether the theory is contradicted.
~bO Induction course 14
0<1,)'2
" 12 7 13
<><·-
Initial on-the-job ~ Pass 102 45 It has been suggested that Mendel's results are suspect in that they are unlikely to have been
'"'~ Continuous 28 32 58 obtained from random observations. Comment on this suggestion in relation to the value of
'" " ~ Fail 21 32 x2 calculated. (C)
. a t the solto level of significance
E xamme . . whether
these data provide evidence of assoclatiO~ .
between length of service and type o~ trammg, Solution 12.10
stating clearly your null and alternative
hypotheses. . (MEJ) H 0 : The different types of peas occur in the ratio 9 : 3 : 3 : 1.
Discuss your conclusiOns. H 1: The different types of peas do not occur in this ratio.
Expected frequencies, according to H 0 :

Round and yellow ,';;X 556 = 312.75


Summary Wrinkled and yellow fG X 556 = 104.25
Round and green fG X 556 = 104.25
x2 significance test Wrinkled and green fG X 556 = 34.75
w The test statistic is XI where
2 Round Wrinkled Round and Wrinkled
where
2 ""'(O- E)
X= L..., E
and X2 - x2(v). and yellow and yellow green and green
0 bserved (0) 315 101 108
When v = 1, use Yates' correction, 32
Expected (E) 312.75 104.25 104.25 34.75
(\ 0 - E \- 0.5) 2
where X =
2
L E There are four classes and one restriction (l:E = 556)
Therefore v = 4- 1 = 3 and the x2 (3) distribution is considered.
Remember to combine classes if E.;;; 5.
Perform the test at the 10% level.
., Degrees of freedom, v
From tables x\o%(3) = 6.251, so reject H 0 if X 2 > 6.251.
x2 goodness of fit tests
v=num b er o f Class es- number of restrictions (see table on page 579) (0-E) 2 2 (O-E)2
=L
0 E
E X E = 0.47 (2 d.p.)
x2 tests for independence
315 312.75 0.0161 ...
For an h by k contingency table, v = (h -1)(k - 1 ).

~
101 104.25 0.101
108 104.25 0.134
32 34.75 0.217
E0 ~556 EE~556 0.470
6.251
Since X 2 < 6.25, accept H 0 and conclude that the types are in the ratio 9: 3 : 3 : 1. Expected data:
For school A and three passes
The calculated value of X 2 is very small indeed, suggesting very little discrepancy between the
observed and expected frequencies. expected frequency= row total x column total
From x2 tables, P(X 2 < 0.352} = 5% so on only just over 5% of occasions would you grand total
expect to have a test value this low. This could suggest that the data are not random 26 X 25
observations. 63
= 10.32 (2 d.p.}
The complete table is as follows·
Example 12.11
Mr and Mrs Smith live in a small town with two primary schools A and B. They are trying to 3 passes 1 or 2 passes No passes Totals
decide which school would provide the better learning environment for their children. They School A 10.32 8.25 7.43 26
have available the results of recent national tests in mathematics, English and science. Each School B 14.68 11.75 10.57
child in the final year took three tests, one in each subject, and they either passed or failed 37
each test. These results are summarised in the table below. Totals 25 20 18 63
3 passes 1 or 2 passes No passes The table has 2 rows and 3 columns
so v = (2- 1}(3- 1} = 1 x 2 = 2 and th e X'(2} d.Istn.butwn. is considered
School A 15 6 5
13 T est at the 5% level. · ·
School B 10 14
From tables ' x' 5% (2} -_ 5 •9 91 'so reJeCt
. Ho if x2 > 5.991.
(a} Stating your hypotheses clearly test, at the 5% level of significance, whether or not there is
evidence of an association between school and test results. 0 (O-E)'
E
E xz= L (0-E)'
E = 6.01 (2 d.p.}
Mr and Mrs Smith also have available the results of a questionnaire about the annual family
15 10.32 2.122 ...
income x, in thousands of pounds, of the families of the children taking these tests. The results
6 8.25 0.613 ...
are summarised in the table below.
5 7.43 0.794 ...
x>30 20 <X<; 30 15 <x<:20 X<; 15 10 14.68 1.491 ...
5 14 11.75 0.430 ...
School A 7 5 9
13 10.57 0.558 ...
School B 6 13 8 10 5%
J; 0-63 l:E = 63 6.012 ...
A x2 test fOr association between school and family income using this information gave a test 5.991
statistic of 3.545. There was no pooling of classes.
Since X 2 > 5 .991 ' reJect
· H o and conclude that ther . .d f
(b) Using a 5% level of significance, interpret this statistic stating the critical value used. the school and the test results. e IS evt ence o art association between
(c) In the light of parts (a} and (b) state, giving reasons, which of the two schools Mr and Mrs (b) The table has 2 rows and 4 columns
Smith might choose for their children. (L} SOV=(2-1)(4-1)=1X3=3 and t h e X2( 3} distnbutwn
. . . is considered
Ho: The two factors 'school' and 'f .I . , . .
H . Th . arm y mcome are mdependent
Solution 12.11 ere IS an association between school and f .I . .
F 1· amt y mcorne.
(a} H 0 : The two factors 'school' and 'results' are independent. 2
rom tables X sd3) = 7.815, so reject H 0 if X'> 7.8 1S.
H : The factors are not independent and there is an association between school and
1
results.
It is given that xz = 3.54S
Since X'< 7.815, do not reject H
Observed data: There is no associatiOn
· · between school
o· and f "I .
No passes Totals (c) A . ami y mcome.
3 passes 1 or 2 passes
s there Is no association between school and fa 'I .
5 26 to base their choice on the results of the n t" lmi y mcome, Mr and Mrs Smith are likely
School A 15 6
37 school A obtained three passes a dwna tests. Smce 57% of the pupils in
School B 10 14 13 M d , as compare With onl 2 7'X . h hr
ran Mrs Smith might conclude that scho 0 I A yd ohWJt t ee passes in school B,
20 18 63 envtronment. prov1 es t e better learning
Totals 25
,-, ,-'~TIS

!!-!' TTST 595

Miscellaneous exercise 12d (a) Stating your hypotheses clearly, test at the
5% level of significance whether or not The null hypothesis is that a person's opinion
there is any evidence of an association about the amount of sport shown on TV is
1. It is suggested that preferencbs for three Eye colour between brand of fertiliser and yield. independent of the person's sex.
f a town y~pass are
proposed routes or I r e Each person in Grey or (a) Construct a table showing the expected
1
associated with ;~£r;goe;~oepl~~ ~hosen fr~m the
Fertilisers A and B are produced by Quickgrow
Blue green Brown whereas Cis produced by Bumpercrops. The frequencies, assuming that the null
a random samph d surrounding villages,
inhabitants of.t e town
sked whtch route e or
hn she preferred. The ~
0 - 7 8 18 farmers wanted to decide from which company
to purchase fertiliser and combined the figures
hypothesis is true.
(b) Use a x test to test this null hypothesis,
2

was~~
resu s are given in the following table.
·p
u + 29 10 16 for A and B to give a 3 x 2 table. The statistic using a 5% significance level. Show full
•~
0:: ++ 21 9 2 L (0- E)
2 details of your method and state your
Town Surrounding villages conclusion dearly. (C)
E
50 25 the5%
Perform an appropria~e test at null and 8. The Director of Studies at a College of Further
Route 1 for this new table was calculated and gave the
tm our
value 7.622. Education believed that there was a connection
Route 2 28 22
between candidates' grades in mathematics and
Route 3 16 9 (b) By carrying out a suitable test at the 5% physics at A-level. For a set of candidates who
level of significance, advise the farmers had taken both examinations, she recorded the
. 11 and alternative whether or not there is any evidence of an number of candidates in each of four categories,
State appropnate nu 2 test to test at the 5%
as shown in the table.
hypotheses, and use a X tion th~t there is an association between the choice of company
significance level, the surges d route and where and yield.
association between pre erre (C) (c) Giving your reason, advise the farmers Mathematics Mathematics
people live. which company they should use. (L) grades A-C grades D-U
le of supermarkets was sent Physics grades A-C
2. (a) A rand?m sa~p 6. A statistician, who is suspected to be suffering 22 9
hich they were asked to
from asthma, is asked to record his peak flow Physics grades D-U
a questwnnatre on~ f shoplifting 8 15
report the number o cases o h f h Lower value of Number of measurement four times each day for a period of
they had dealt with in each mon~ o t th grouping interval Observations four weeks. (a) Test the Director's belief at the 2.5% level
previous year. The totals for eac mon He groups by value the 112 recorded of significance, stating your null and
were as follows. -oo 0 measurements into seven classes giving observed alternative hypotheses.
-2.0 1 frequencies, o;, i = 1, 2, ... , 7. He then calculates Her colleague said that she was losing accuracy
J F MAM] J A s 0 N D correctly corresponding expected frequencies, ei, by combining the grades A to C in one group,
-1.5 0 using a normal distribution having mean and
6 18 16 17 10 22 14 16 and grades D to U in another. He suggested that
16 12 10 17 -1.0 6 variance estimated from the original she should create a 7 x 7 table showing all
-0.5 10 measurements. possible combinations of grades.
Carry out a chi-squar.ed .t~st at an

±
. 1 1of stgmftcance to 0.0 12 The value of the test statistic (b) State why his suggestion might lead to a
approp:tate h~~er or not shoplifting is more 0.5 15 (o 1 - e1)
2 problem in performing the test. (L)
determme w e. hs than others.
likely to occur 111 some ilio~~ be of the same 1.0 23 i~t e; 9. During a working day a machine requires
(You may takke alll monur ~ull and alternative 1.5 16
is then calculated correctly by the statistician as occasional adjustments which appear to be
1 th)Ma ecearyo
h;:otheses, the level of ~ignificance you are 2.0 13 5.624. randomly disrr·ibuted throughout the day. A
3 factory foreman records the number of
using, and your concluswn. 2.5 (a) Using a 1% level of significance and stating adjustments made to the machine each day for a
. · h se the fact that, 3.0 1 the null hypothesis, complete the test.
~h~:aif;~~~~~u;:sof {.are equal, the us~al 3.5 0 (b) Give the usual requirement made on each of
period of 200 working days, obtaining the data
displayed in the table.
chi-squared test statistic may be wntten a the values e 1 prior to calculating the test
(C) statistic, and indicate how a failure to meet Number of adjustments 0 1 2 3 4 5
.!_L"-Lfo . d .d d to test three new the requirement may be overcome. (NEAB)
f lo 5. A farmers' co?.rerattveB :~deC allocating them Number of <:fays 34 78 61 20 5 2
brands of fertthser, A, h ield of the crop was 7. A random sample of 100 people was asked for
(b) Provee the result glVen
.
a t the end of part (a).(MEI) at random to 75 plots .. T e y l The results are their opinions about the amount of sport shown Previous experience has suggested that the daily
classified as high, medmmb ol r ow.
· din the table e ow. on TV. Each person had to say whether there number of adjustments to this machine follows a
summanse
. iation between was too much sport shown, about the right Poisson distribution with mean 1.5.
3. It is thought that ther~ ts a~s a:~~cthe reaction of amount, or not enough. The numbers of men 2
the colour of a persol s ey. let light In order to Fertiliser (a) Perform a x goodness of fit test to decide
and women making each response are shown in
the person's s.kin th u ya:~:dom sa~ple of 120 c Total whether the data in the table can reasonably
investigate thts ~ac dotoaa standard dose of
people was subjecte d of their reaction was High
A

12
B
15 3
-
30
the table.
be considered as conforming to a Poisson
distribution with mean 1.5.
Men Women (b) Outline, without detailed calculation, the
~
ultraviolet light. '!'he egree . '+'indicating 8 8 8 24
d ,_, indicatmg no reactton, Medium necessary modifications to your test if the
21 Too much sport 13 26
note , . d , ++, indicating strong Low 5 7 9 Poisson mean is not assumed to be 1.5.
slight reactwn an l hown in the table About right 22
reaction. The resu ts are s 75 22 (c) The distribution B(5, 0.3) is a very good fit
Total 25 30 20 Not enough sport 12 to the data in the table. Without further
below. 5
calculation, explain why, despite this good
fit, the binomial model is not appropriate.
(NEAB)
597

12. A factory operates four production lines. (c) Given that a total of 30 students
10. A department store has five doorways, each for
entrance and exit. It is claimed that the
Maintenance records show that the daily number
of stoppages due to mechanical failure were as
~~:~e~e:~·eatnum
theb5% significanc:~!:~t
er of student .
State theI condition
th · which sometimes n ecessttates
e ~rna garnatwn of rows or columns in
.

proportion of shoppers entering or leaving the shown in the table below (it is possible for a a particular question is ass . s adtte~ptmg c~ntmgency tables. Explain why amalgamation
store is the same for each of the five doorways. production line to break down more than once type of question. octate With the mrght not ~e appropriate for this table.
The number of customers entering or leaving the The followmg table summarises the data relatin
on the same day). You may assume that (d) Compa•e the diff I d
store is counted at each doorway for three different types of Iqc~ ty,.an . pohpularity of the to the day of the week on which the accident g
Lf~ 1400, Lfx ~ 1036. es 1011 m t e light of
randomly selected days with the following results. occurred.
your answers to (a) and (c)
3 4 5 6 or · (AEB)
Number of customers Number of 0 1 2
Doorway Number of
more 14. (a) rbhe numd her of books borrowed from a
stoppages, x
601 Ml raryd uring a ceramt · weelc were 518 on Day accidents
A
673 Number of 728 447 138 48 26 13 0 d
W 0 ay, 431 on Tuesday 485 on
B
~d nesday, 443 on Thur;day and 523 on Monday 60
c 626 day',f Fn ay. Tuesday 54
D 618 (a) Use a x2 distribution and a 1% significance ~s th{r~ any evidence that the number of Wednesday 48
702 oo s arrowed varies between the five
E level to determine whether the Poisson Thursday 53
distribution is an adequate model for the · Tof the week>· Use a 1"'
days I I f
toeveo
Test whether or not these data support the claim. stgm tc~nce. Interpret fully your Friday 53
data. conclusiOns. Saturday
The same store also records the daily number of (b) The maintenance engineer claims that 75
(b) Analysis of the rate of turnover of Sunday
sales charged to stolen credit cards. The results breakdowns occur at random and that the 77
employees. by a personnel manager produced
for the first four months of 1990 are as follows. mean rate has remained constant
the followmg table showing the length of
throughout the period. State, giving a
sttahy of 2001 people who left the company for
!~~~~~aste th~ h[pothesis .that these data are a
Number of sales Number of days reason, whether your answer to (a) is amp e rom a umform. distribution.
o er emp oyrnent.
consistent with this claim. (AEB)
0 31 (c) Of the 1036 breakdowns which occurred
39 230 were on production line A, 303 on B, Length of employment (years) 16. (a) The number of accidents per day on
1 stretch of motorway was recorded fora 100
19 270 on C and 233 on D. Test at the 5% 0-2 2-5 >5 d
2 ays and the following results obtained.
significance level whether these data are
3 11 consistent with breakdowns occurring at an ~ Managerial 4 11 6
:>4 0 equal rate on each production line. (AEB) ""(3" Skilled 32 28 21
Number of accidents Frequency
Unskilled 25 23 50
Explain why a Poisson distribution may be 0 44
13. A group of students studying A-level statistics
appropriate as a model for the daily number of was set a paper, to be attempted under 1 32
sales charged to stolen credit cards. Test the examination conditions, containing four Yfmg a ~% level of significance, analyse th.
2
hypothesis that the daily number of sales does questions requiring the use of the x distribution.
2
~~~:~::tn· and state fully the conclusions from 3
9
follow a Poisson distribution. (NEAB) The following table shows the type of question ysts. (AEB) 10
and the number of students who obtained good 4 5
15. Over a long penod of ttrne, a research team
11. In the mathematics department of a college, (14 or more out of 20) and bad (fewer than 14 5 or more 0
candidates in an examination are graded A, B, C, momtored the number of car accrdents whl h
occurred m a particular county. Each acctd~
out of 20) marks.
D or E. Records from previous years show that Examme whethe r or not a p otsson
·
was classtfted as bemg tnvial (mmor dam nt . model is
examiners have awarded a grade A to 15% of
no/ersonal InJUries), senous (damage to ~~~I~}~~
Type of question SUitable to represent the number of 'd
candidates, B to 20%, C to 35%, D to 25% and er d h' acc1 ents
Contingency Binomial Normal Poisson P ay ~n t. :s ·stretch of road. Use a 1 %
E to 5%. A new syllabus is examined by a new an passengers, but no deaths) or fat 1 (d Ievel of. stgmftcanc:e
table fit fit fit tohvehicles I a famage
and loss of life) . Thecoouro
board of examiners who award the grades to 200 h thee (b) Th ~ results of a survey .
to establish th
candidates as follows: 12 11 w 1C l~ the opmwn of the research team
J ar attt~de of individuals to a particular e
Good 25 12
A, 33; B, 37; C, 81; D, 36; E, 13 mark
~l~~eth! da ayc~frdtehnt waskalso recorded, tog~ther pohtlcal proposal showed that
e wee on which th 'd ~hree-quarters of those interviewed were
(a) Stating clearly your hypotheses and using a occurred The f0 ll d e acc1 ent
11 3 12 · owmg ata were collected. ouse owners. Of the 44 interviewed
5% level of significance investigate whether Bad 4
or not the new board of examiners awards mark Colour only 6 ofhthe 35 in favour of the pro~osal
Trivial Serious Fatal were not ouse owners.
grades in the same proportions as the
(a) Test at the 5% significance level whether the White 50 D~e~ the survey indicate that a person's
prevwus one. 25 16
mark obtained (by the students who . Black ~pmwn on the proposal is independent of
In addition to being classified by examination 35 39 18 .ou~~ ownership? Use a 1% level of
attempted the question) is associated w1th Green
grade, these 200 students are classified as male 28 23 13 Sigmftcance. (AEB)
the type of question.
or female and the results summarised in a (b) Under some circumstances it is necessary to Red 25 17
contingency table. Assuming all expected values
11
combine classes in order to carry out a test. Yellow 17 20
are 5 or more, the statistic 16
If it had been necessary to combine the . Blue
binomial fit question with another quesuoJ~,
24 33 10
2::ro (0.-E.)' was 14.27.
1 1
which question would you have combined Jt
~~~yse t~ese diata for evidence of association
i~J Ei with and why?
(b) Stating your hypotheses and using a 1% accid~:~. t e co our of the car and the type of
significance level, investigate whether or not
sex and grad~ are associated. (L)
598 l\ z:O>iC:I~J'

Mixed test 128


Mixed test 12A
I h ne calls received per day 3 The heights (x) 0 1 .P year are summarised 1. It is claimed that when homing pigeons are
1. The number of te ep do . hown in the table . 1" f in a partKu1ar d d disorientated harmlessly they will exhibit no The die is thrown 150 times, and the results are
over a period of 150 ays ts s a po tee ore~ ble The mean and stan ar recorded in the table below.
in the followmhg ta ... l data are 180 em and particular preference for any direction of flight
below. deviation oft e ongma after take-off. To test this, 128 pigeons, from
3 em respectively. Score 1 2
0 1 2 3 lofts in a particular region, were disorientated 3 4 5 6
Number of calls harmlessly and then all released from a position
Height(cm) Frequency Frequency 18 15
50 54 36 10 100 miles south of the region. The direction of 19 20 39 39
Number of days flight of each pigeon was recorded with the
x<175 2 following results. Test, at the 5% significance level, the belief that
number of calls per day.
(a) Estimate the mean d f r a Poisson model 175 <;x < 177 15 the die is biased in the way described. (C)
(b) What must be ass~me . 0 ~ Flight
b ·opriate m thts case. 177<;x<179 29
4. A student of botany believed that multifo!ium
to e appr 2 oodness of fit analysis to 25 direction 0"-90" 90"-180" 180"-270" 270"-360"
(c) Carry out a X g h . th t the number of 179 <;x < 181 uniflorum plants grow in random positions in
test the null hypot ests a d h s a 181 <;x< 183 12 Number grassy meadowland. He recorded the number of
telephone calls receive~ per ay 0 ~ 5 at the 10 of pigeons 30 plants in one square metre of grassy meadow,
. 'bution wtth mean · 183 <;x< 185 35 36 27
Poiss';m d. ~stn G e full details of 7
and repeated the procedure to obtain the 148
5% stgmftcance 1eve· 1 tv (C) 185 <;X results in the table.
Use the x goodness of fit test to determine
2
your method.
. mal distribution to the Number
. 1 De artment believes that Fit an appropndate ~~~e goodness of fit at the whether or not these data can be used to
2. A University Socto dgy dp in A level General above data, an tes
5% level.
(C) discredit the claim. {NEAB) of plants
0 1 2 3 4 5 6 ?orgreater
stud:nts with a goo fir~nesociology degree
Studtes tend tho d1o wh~ ., collected information on
2. An increasing number of people are spending Frequency 9 24 43 34 21 15 2 0
. college was commissioned to their working hours in front of a visual display
S Tocec(ttst h hd. t 4. A survey m a h or not there was any
course . l f 100 tudents w o a JUS
a random samp e J l t s ken General Studies at
graduated and hda a ,so formance in General
investigate whet er d and passing a driving
association between gen1 erd 50 female students
unit (VDU). Sixty-five workers using non-
adjustable screens and 66 workers using
{a) Show that, to two decimal places, the mean
number of plants in one square metre is
A level. The stu ent~ per A of 50 rna e an h . 2.59.
·es those test. grou P h h ssed or failed t etr adjustable screens were asked if they experienced
d" .d d mto two categon ' was asked whet er. t ey pa t All the students annoying reflections from the screens. The {b) Give a reason why the Poisson distribution
Studies was lVl e d 'others'. Their degree
d . . t t at the hrst attemp · resulting responses are given in the table below.
with grade A or Bdad as Class I, Class II, Class nvmghad
asked es tak en t h e t est . The results were as might be an appropriate model for these
data.
classes were recor. e . · the table below.
III, FaiL The data ts gJVen ill
follow_s·-----~~--];;;iJl Annoying reflection Using the Poisson model with mean 2.59,
Class of Degree , Pass Fail No Yes expected frequencies corresponding to the given
frequencies were calculated, to two decimal
Class I Class II Class III Fail Total Male 23 27 Ev Non-adjustable 15 50 places, and are shown in the table below.
Female 32 18 ~ ~ Adjustable 28 38
General Grade Number of plants Expected frequencies
Studies AorB 11 22 6 1 40
h clearly test, at the 10%
Grade Others 4 28 24 4 60 Stating your hypot es~ re is any evidence of an Test the claim that there is no association 0 11.10
50 30 5 100 level, whether or not t ed d passing a driving between screen type and a worker's experience of
Total 15 1 28.76
association between gen er an (L) annoying reflections. (NEAB)
test at the first attempt. 2 s
the 1% significance level,
Use this data to testdat l ss is independent of 3. A six-sided die is believed to be biased in the 3 32.15
the hyp1oSthesdi.s thAatle:Jrpe:rf:rmance. State your following way:
Genera tu tes {C) 4 20.82
conclusion clearly. the probabilities of throwing a one, a two, a 5
three or a four are equal;
10.78
6 4.65
the probability of throwing a five is twice the
probability of throwing a one; 7 or greater t
the probability of throwing a six is three times
the probability of throwing a one. {c) Find the values of s and t to two decimal
places.
(d) Stating clearly your hypotheses, test at the
5% level of significance, whether or not this
Poisson model is supported by these data.
(L)
!-! [',-) 601

Using big S format:

that r is such that -1 :( r o;;;;; 1, where


Significance tests for correlation coefficients Remember
r = -1 indicates perfect negative correlation
r=0 indicates no correlation
r=1 indicates perfect positive correlation.
In this chapter you will learn about
If r is very close to zero, then you would probably say that the two variables X and Yare not
a significance test for r, the product moment correlation coe~ic~ent related at all. If r is very close to 1, for example r ~ 0.992, then you would probably say that
there is a strong positive linear correlation between X and Y. But what about a value for r of
® a significance test for r,, Spearman's coefficient of rank corre a lon
0.694? Would you be able to claim that this indicates positive correlation? What about a
value of -0.5? Does this indicate negative correlation between the variables? A significance
Background knowledge . . . t d with correlation (see Chapter 2 page 119) test is needed!
You need to be familiar wtththe thdeas a~so~ta ~ment correlation coefficient (page 139) and
and the methods for calculatmg t e pro uc -m In order to carry out a significance test, assume that X and Yare jointly normally distributed
Spearman's coefficient of rank correlatton (page 146). with correlation coefficient p, referred to as the population correlation coefficient. Data must be
collected so that they constitute a random sample from the whole population values of X and Y.

SIGNIFICANCE TESTS FOR CORRELATION COEFFICIENTS


The null hypothesis, H0
. . . need to review the work covered in Chapter 2 on the product-
Before tacklmg thts sectton you d S carman's rank correlation coefftctent.
moment correlatton coefftctent an p . . I I e an assessment of the The null hypothesis is always that the correlation coefficient is zero, i.e. there is no correlation
. . h b
calculated It IS usua to rna c . between the variables. This is written H : p ~ 0.
When a correlation coefflcle~t as ee; 1 that there is good positive correlatiOn 0

degree of-correlation. You might say, or ~xampt~' correlation There is a significance test
between the variables or that there is wea c nega l~et" n betwee~ the variables, backed by
that allows you to decide whether there·~ a corre a 10 The alternative hypothesis, H 1
statistical theory rather than just a suspiCIOn.
The alternative hypothesis depends on whether the test is one-tailed or two-tailed.
One-tailed tests
TEST FOR THE PRODUCT-MOMENT CORRELATION COEFFICIENT, r
If you think there is a positive correlation between the variables X and Y, the alternative
h roduct-moment correlatwn hypothesis is H 1: p > 0 (there is a positive correlation between the variables).
In C h apter 2 (page 1 39) you learnt how to calculate r, t e p
coefficient between two sets of data X and Y. If you think there is a negative correlation between the variables X and Y, the alternative
hypothesis is H 1: p < 0 (there is a negative correlation between the variables).
Using small s format: Two-tailed tests
1 -- };xy -- If you are looking for a correlation but not specifying whether it is positive or negative, then
where s ~-};xy-xy~---xy
xy n n *
the alternative hypothesis is p 0 (there is some correlation between the variables).

s
X
~£~ ~r]:_n};xl-xl~
XX
J};nx2 -xl The calculated value of r, the product-moment correlation coefficient, is compared with the
critical value which is found from tables. An extract is given below and the tables are printed
on page 652.

sy~-rs:~ ~;;L.,Y
/1" 2_-2~
Y
J};yz
n
-y2
!, I I \f
!(}
ICiU\'TS 603

Critical values for product-moment correlation coefficient

Sample 0.6319 0
Level +---------1
0.6319 1
1·-----~·
0.10 0.05 0.025 0.01 0.005 size ~·rii rc: I . ,,.,.,,1

0.8000 0.9000 0.9500 0.9800 0.9900 4


Example 13.1
0.6870 0.8054 0.8783 0.9343 0.9587 5

0.8114 0.8822 0.9172 6 The scatter diagram illustrating ten pairs of values (x, y) is shown below.
(i) 0.6084 lo.7293l
7 y
0.5509 0.6694 0.7545 0.8329 0.8745
(ii) 0.5067 0.6215 0.7067 lo.7887l 0.8343 8 30

04716 0.5822 0.6664 0.7498 0.7977 9


(iii) 0.4428 0.5494 lo.6319l 0.7155 0.7645 10 20

The tables are easy to use. The highlighted values are referred to in the following illustrations:
10
(i) Consider hypotheses
H 0 : p ~ 0 (there is no correlation between the variables)
HI: p > 0 (there is a positive correlation between the variables). 0 10 20 30 X
(a) Comment on the diagram.
This is a one-tailed (upper tail) test. At the 5% level, the critical value is found under
column 0.05. If r has been calculated from, say, six pairs of data, i.e. sample size 6, the (b) Calculate the value of r the product m .
shown in the diagram. , - oment correlanon coefficient for the pairs of data
critical value is 0.7293.
This means that in random samples from a distribution in which p ~ 0, only 5% of these (c) Assuming that X andy are jointly normall d' 'b . .
samples will give a value of r greater than 0. 7293. So, at the 5% level of significance, you the data constitutes a random sample testy t ~~tn5~eld w;th correlatron coefficient p, and
would reject H 0 (that there is no correlation) in favour of HI (that there is positive correlation between X and Y. ' ' a e o eve ' whether there IS a positive
correlation) if r > 0. 7293 (d) Would your conclusion be the same at the 1% level?

-1 0 Solution 13.1
(a) From the scatter diagram there appears t b .. .
(ii) The same tables are used when testing for a negative correlation. Consider hypotheses not appear to be very str~ng. o e some positive Imear correlation but it does
H 0 : p ~ 0 (there is no correlation between the variables) (b) In the diagram, the data points are
H 1: p < 0 (there is a negative correlation between the variables).
5 8 12 15 15 17
This test is one-tailed (lower tail). At the 1% level, look up the value in the column 20 21 '25 27
headed 0.01. For a sample size of eight pairs of data, the value given in the table is y 3 11 9 6 15 13 25 15 13 20
0.7887, indicating that the critical value is -0.7887. At the 1% level, you would reject H 0
if r < -0.7887. Using the calculator in LR mode, it can be shown that r ~ 0.6954 (4d.p.).
(See page 140 if you need to review how to calculate r with or with t I l
. .. . ~ au a ca cu ator.)
-1 -0.7887 0 (c ) Th e sigmftcance test IS carried out as follows:
<--------i
u·i!ic;;i ;n;:(
Ho~ p ~ 0 (there is no correlation between X and Y)
(iii) Now consider hypotheses HI. p > 0 (there IS positive correlation between X andY)
H 0 : p ~ 0 (there is no correlation between the variables) Perform a one-tailed (upper tail) test at the 5% level.
*
HI: p 0 (there is some correlation between the variables). 3· S;:rr'c :he· r_·c')l:'\ I tun
The sample size is 10.
CritC't'iC!li,
This test is two-tailed. At the 5% level of significance, you want critical values that give . H o I'f r > 0.5494.
From tables, the critical value is 0 .5494 , so reJect
2.5% in each tail, so look under the column headed 0.025. For a sample size of 10, the
critical value given in the table is 0.6319. This means that you would reject H 0 in favour From the calculations in (b), r ~ 0.6954 .
of HI if r >0.6319 orr< -0.6319 i.e. if Ir I> 0.6319. Since r_> 0.~494, Ho is rejected in favour of HI.
There IS evidence of positive correlation between X and y .
604 T
I
. 0 · 7155 so H o is rejected if r> 0.7155.
lue IS
to level ' the critical. va H
(d) For a test at the 1 o; SPEARMAN'S COEFFICIENT OF RANK CORRELATION, rs
Since r = 0.6954 < 0.7155, do not reJec;d o· that there is positive correlation
At the 1% level, there is not enough evl ence to say Spearman's coefficient of rank correlation is calculated using the ranks of the data. As you saw
between X and Y. on page 146, for n data points, if d is the difference between the ranks for a data point, then
r, 1-
l) .
nee for C1
l
Remember that -1 ,; ,; : r s,;;,;;: 1, with rs = 1 indicating perfect agreement between the rankings,
h f the following significance tests for the
1. Ineaco Value of car, Y
ff' h Value of house, x rs = -1 indicating that the rankings are in exact reverse order (complete disagreement) and
product-moment correlation coe Ictent t e r s = 0 indicating no correlation between the rankings.
calculated value of r is as shown. Us~ ta~les of . 110 12
critical values to decide whether Ho IS rejected or 106 9.5 Writing p, for the population rank correlation coefficient, the null hypothesis is always
not. 2.4
51
H 0: p, = 0 (there is no correlation between the rankings)
Level of 94 4.2
66 4.1 The alternative hypothesis is either
n r Hypotheses significance
26 0.3
7 0.893 *
Ho: P = 0, Ht: P 0 2% H 1: p, > 0 and there is positive correlation (agreement) between the rankings
(a) 72 3.2
1% (one-tailed (upper tail) test)
(b) 14 0.499 H 0: p = 0, H1 : P > 0 6.0
51
(c) 28 0.324 Ho: P = 0, Ht: P 0* 10%
53 7.8 or H 1: p, < 0 and there is negative correlation (disagreement) between the rankings
(d) 28 0.324 H 0: p = 0, H1 : P > 0 1% 15 (one-tailed (lower tail) test)
133
-0.419 Ho: P = 0, HI: P < 0 5%
(e)
(£)
16
12 -0.689 *
H 0: p = 0, H1 : P 0 10% A student argues that when two vanables
or H 1 : p, * 0 and there is correlation between the rankings (two-tailed test).
Ho:p=O,H1:p>0 1% {d) are correlated one must be the ca_use of the Note that the test for Spearman's coefficient of rank correlation does not make any
(g) 12 0.689
1% other. Briefly discuss this view wtth reg~~I assumptions about the population parameters. It is known as a non-parametric test.
(h) 10 0.733 H 0:p=O,Ht:p>0 to the data in this question. ( )
The critical values for Spearman's rank correlation coefficient are found from tables which are
A small bus company provides a. serv~ce for a 4 For the sets of data given, test th.e hypotheses
2 ' 'ndicated. Then draw a scatter dtagram and very similar in format to those for the product-moment correlation coefficient. An extract is
' small town and some neighbounng vtllages. In a
study of their service a random s~mple of .
~omment on whether this reinforces your shown below and the tables are printed on page 652.
20 J'ourneys was taken and the dtstances x, tn conclusion. 1 t'
· · mm
· utes ' were [p is the population product-moment corre a ton
kilometres and journey tunes t, m
recorded. The average dis.tance was 4.535 ~m es
coefficient.] Critical values for Spearman's rank correlation coefficient
and the average journey tlme was 15.15 mmut . (a)
Sample Level
(a) Using Lxz = 493.77, Ltz = 4897,
Lxt = 1433.8, calculate the product-moment X 7 12 13 17 23 25 30 20 size 0.05 0.025 O.Dl
correlation coefficient for these data.
(b) Stating your hypotheses clearly,, test? at the y 23 22 18 15 7 13 8 27 4 1.0000
5% level, whether or not there.ts evtden.ce of 5 0.9000
_ 0 ' H.
H o: p- p < 0·' 5% significance level 1.0000 1.0000
a positive correlation between JOUrney ttme 1'
6 0.8286 0.8857 0.9429
and distance. b d (b)
(c) State any assump~ions that have to e rna e X y 7 I o.7143 I 0.7857 0.8929
to justify the test m (b). (L) 8 0.6429
5.1 5.3 0.7381 0.8333
3. In order to investigate the sltrengfthhof the nd the
correlation between the va ue o a ouse a 5.4 10.2 9 0.6000 1 o.7ooo 1 0.7833
value of the householder's car, a random sa.mple 5.5 15.7 10 0.5636 0.6485 0.7455
of householders was questioned. T~e res.ultmg 10 5 11 0.5364 0.6182 0.7091
data are shown in the table, the umts bemg 10.2 10.9
thousands of pounds. 10.4 15.1 For a one-tailed test at the 5% level, sample size 7, look under column 0.05. This gives the
l.:x = 762 LX'= 68 088 LY = 64.5
15 5.3 value 0. 7143 and means that
LY' ~ 606.63 LXY ~ 6067.4 15.4 10.9
(a) Represent the data graphically. .
'" for H 1: p, > 0, H 0 is rejected if r, > 0.7143
15.6 15.3
(b) Calculate the product-moment correlation 30 25.1
coefficient. . bl 20.2 20 -1 0 0.7143
(c) Carry out a hypothesis test, at .a sutta e
level of significance, to determme whether . _ 0 ' H t·. p > 0·' 1% sigmftcance level
or not it is reasonable to suppose that the. H o·P-
value of a house is positively correlated wtth
the value of the householder's car.
• for H 1 : p, < 0, H 0 is rejected if r, < -0.7143.
T (b) The significance test is carried out as follows:

-1 -0.7143 0 Ho: P, ~ 0 (there is no correlation)


+--1 H · p > 0 (th · ··
:·;ritk:JI rr,;§'iUri ;- I· s ere Is positive correlation between the boy's and . l'
preferences) gir s
For a two-tailed test at the 5% level, sample size 9, look under column 0.025 (half of 5%). l_\')-'c-ni 1\"ol.

This gives the value 0.7000 and means that Perform a one-tailed (upper tail) test at the 1% level.
St;uv 1·iw i'c'I,Xtton
The sample size is 11.
" for H 1: p, * 0, H 0 is rejected if r, >0.7000 orr,< -0.7000, i.e. if Ir, I> 0.7000. cT!i(.'(i•.)l),
From tables (pag 652) h ·.
r, > 0.7091. e 't e cntica1value is 0.7091, so reject Ho if
-1 -0.7000 0 0.7000
From part (a), r, ~ 0.8636 ...

Since r.' > 0. 7091, Ho is rejected in favour of HI.


Example 13.2 Th;re Is evidence of positive correlation between the boy's and.,; 1'
pre1erences. bs s
A teacher selects one boy and one girl at random from her class, and asks them to arrange (c) The boy and girl agree in their preferences.
11 types of food in order of preference. The food types are labelled A to K and the results are
given below.

Boy's order: E K F c B I D A G J H 13b Significance test rrna n's of rank


c A
correlation
Girl's order: F K E B I H D J G
1. In each of the following significance tests for
(a) Calculate Spearman's rank correlation coefficient for these data. Spearman's rank correlation coefficient th Find, to. three decimal
.. places • the Spearmanran k
value of I: J2 obtained when calculatin~ r ?s as corre1atwn coe 1ftctent between the order of
(b) Stating your hypotheses clearly test, at the 1% level of significance, whether or not there is shown. Use ~abi:s of critical values to decide manufacture and the order given by th e expert.
evidence of a positive correlation. whether H 0 ts reJected or not. Refe: t~ ~abies of critical values to comment on
the Stgmftcan~e of rou~ result. State clearly the
(c) Interpret your conclusion to the test in part (b). (L) Level of null hypothests whtch 1s being tested. (L)
n Ld2 Hypotheses significance
(a) 3. ~ppli~ants for a job with a company are
9 212
Solution 13.2 H0 :p,=O,H1 :p.<O 1% mte~vtew~d by hvo of the personnel staff. After
(b) 8 30 H0 :p.=O,H1 :p.>O 5% ~he mtervtews each applicant is awarded a mark
(a) (c) 8 30 y each of the interviewers. The marks are given
Ho: Ps = 0, Ht: Ps =1=- 0 5% be 1ow.
(d) 10 78 Ho:p,=O,Ht:Ps>O 5%
Food type A B c D E F G H I J K (e) 10 252 Ho: Ps = 0, Ht: Ps < 0 5% Candidate
(f) 10 274
Boy's order, x 8 5 4 7 1 3 9 11 6 10 2 Ho: P,= 0, Ht: Ps* 0 5%
(g) 7 18 A B c .D
Ho:p.=O,H1 :p,=l=-0 10% E F G H
Girl's order, y 9 5 4 8 3 1 11 7 6 10 2 (h) 7 106 Ho:p,=O,H1 :p,<O 1% futerviewer 1 22 27 24 17 20 22 16 13
(i) 7 14
d -1 0 0 -1 -2 2 -2 4 0 0 0 Ho: P, = 0, Ht: Ps =1=- 0 5% Interviewer 2 28 23 25 14 26 17 20 15

d' 1 0 0 1 4 4 4 16 0 0 0 2. An ex~ert on porcelain is asked to place 7 china (a) Calculate, to two decimal places, the
bowls m date order of manufacture assigning the Spearman rank correlation coefficient
6L-d 2 rank 1 to the oldest bowl. The actual dates of bet~een these hvo sets of marks.
L.d 2 ~ 30 and n ~ 11, so r, ~ 1- manufacture and the order given by the expert {b) statmg Y?ur. ~ypotheses and using a 5%
n(n 2 -1) are shown. level of stgmftcance, interpret your result.
6 X 30 Bowl (L)
~1- Date of manufacture Order given by expert
11 X 120
A 1920
~ 0.8636 ... 7
B 1857 3
c 1710 4
D 1896 6
E 1810 2
F 1690 1
G 1780 5
(a) Calculate, to two decimal places, a rank
4. Ten architects each produced a design for a new correlation coefficient for the performances
building and two judges, A and B, independently 3. State the rejection criteria b ..
awarded marks, x andy respectively, to the 10 of the ski-jumpers in the two jumps. Reject H if n, o tm~mg the critical value from tables.
(b) Using a 5% level of significance and quoting 0 ·
designs, as given in the table below.
from tables of critical values, interpret your Reject H 0 if R . H "f
r >critical value eJect 0 1
Design Judge A (x) Judge B (y) result. State clearly your null and alternative r <- critical value I I
hypotheses. (L) 4. Calculate r and compare with the critical value. r >critical value
1 50 46 5. Make your conclusion.
2 35 26 6. The positions in a league of 8 hockey clubs at the
48 end of a season are shown in the table. Shown Significance test for Sp earman,s rank correlation coeffic. t
3 55
4 60 44 also are the average attendances (in hundreds) at Note th t hi8 18
. •en ' r,
home matches during that season. a t a non-parametric test.
5 85 62 Calculate a coefficient of rank correlation 1. State H . p - 0 (th ·
25 28 State Ho· ,f-11 ere IS no correlation between the ranks of X and Y)
6 between position in the league and average home 1 asoows
7 65 30 attendance.
60 Hl: P s > 0 (there is agreement HI: Ps < 0 {there is
8 90
34 Club Position Average attendance between the ranks of
*
H1: Ps 0 (there is correlation
9 45 disagreement between the ranks of
10 40 42 X andY) between the ranks
A 1 30 X andY)
Calculate Spearman's rank correlation coefficient B 2 32 of X andY)
for the data and test, at the 5% level, the c 3 12 2. . and type o f t est, e.g. one-tailed test at the 5o/r: 1 I
State the level
hypothesis that there is no correlation between
the marks awarded by the two judges. (C) D 4 19 3. Sta.te ther~Jection criterion, obtaining the critical value fro: ::~I~s·
E 5 27 Rqect H 0 1f . H {} 1.f
R eJect R .· H "f
5. In a ski-jumping contest each competitor made F 6 18 rs >critical value r < .. I I eJect o I
s - cnt1ca va ue 1 1
two jumps. The orders of merit for the 10 7 15 4. Calculate r and compare with th .. I I rs >critical value
G s e cntJCa va ue.
competitors who completed both jumps are
H 8 25 5. Make your conclusion.
shown in the table.

Ski jumper First jump Second jump Refer to the appropriate table of critical values to
comment on the significance of your result,
A 2 4 stating clearly the null hypothesis being tested.
(L)
B 9 10 Miscellaneous worked example
c 7 5
D 4 1
8
Example 13.3
E 10
F 8 9 During the lambing season 8 ewes and the lambs the b . .
G 6 2 with the followmg results: y ore were weighed at the time of birth
H 5 7
I 1 3 Ewe A B c D E F G H
J 3 6
Weight of ewe, x kg 44 41 43 40 41 37 38 35
Weight of lamb, y kg 3.5 2.8 3.2 2.7 2.9 2.5 2.8 2.6
You may assume I:x ~ 319 ' 2::Y ~ 23.0, I:x ~ 12 785,
2
I:yz ~ 66 _88 ,
Summary I:xy ~ 923.2.
Ma ~ulate the product-moment correlatiOn coefficient between X and y
C I
e Significance test for the product-moment correlation coefficient, r .a mg any ~ecessary assumptions, test whether the data could h .
The assumptions are that X and Yare jointly normally distributed and the sample must with correlation coefficient p ~ 0 Use so' I I f . .f. ave come from a populatiOn
· a 10 eve o s1gm Icance · (AEB)
constitute a random sample from the whole populations of X and Y.
L State H 0 : p = 0 (there is no correlation between X and Y)
State Hi as follows
Hi: p > 0 (there is positive H 1: p < 0 (there is negative Hi: p *
0 (there is correlation
correlation between X correlation between between X and Y)
and Y) X andY)
2. State the level and type of test, e.g. one-tailed test at the 5% level.
610 /'. COi'JCISE CC)_.If~SF IIJ /\"l c: ·-- <, i~ :SrtCC; 1 61!

Solution 13.3 Example 13.4


Using smalls formula to calculate r:
The coursework grades, A highest to G .
I:xy
923.2 319 23.0 __ given below. lowest, and exammation marks of 8 candidates are
s ~--xy~----x--~0.759 ...
xy n 8 8 8
sxx~ I:x2 -x2 12 785 (319)2 ~ 8.1093 ...
Coursework Examination
· n 8 8 Grade Mark
2 2
I:y -2 66.88 (23.0) A 92
s,,~---;;-y ~---
8
- - ~0.0943 ...
8 c 75
D 63
0.759 ...
0.868 (3 s.f.) B 54
>/8.1093 ... d0.0943 ... F 48
Using bigS formula to calculate r: c 45
I:xi:y 319 x 23.0 G 34
sx,~I:xy--n-~923.2 8 6.075 E 18
(I:x)2 319 2 (a) Calculate the value of a '-----:.----_ __:::-__ _j
sxx~I:x 2 ---~ 12 785 ---~ 64.875 data. n appropnate measure of correlation between these two sets of
n 8
2 (I:y) 2 23.02 (b) Test whether this value indicates evidence of correl .
s,,,~I:y --n-~66.88--8-~0.755 exammatwn grades at a So/c sJ· 'f' 1 1 atwn between coursework grades and
o gDI Icance eve .
s,,
r~-~
6.075
Will 0 (
.868 3 s ..
f) (c) Give a practical interpretation of your value.
SxSy >/64.875 X 0.755 (AQA)
Solution 13.4
The product moment correlation coefficient between the weight of a ewe and the weight of its
lamb is 0.868. (a) Calculating Spearman's rank correlation coefficient r
> X

Coursework A c D B F c
Assume that X and Yare jointly normally distributed with product-moment correlation G E
Examination mark 92
coefficient p and the data form a random sample from the populations of X and Y. The 75 63 54 48
Coursework rank 45 34 18
significance test is carried out as follows: 1 3.5 5 2 7 3.5 8
Examination mark rank 1 6
1.. St-ate E-fu cmd l-!j. H 0 : p = 0 (there is no correlation between the weight of a ewe and its lamb) 2 3 4 5
ldl 6 7 8
*
H 1: p 0 (there is correlation between the weight of a ewe and its lamb) d'
0 1.5 2 2 2 2.5 1 2
0 2.25 4
2. Stan: level and 4 4 6.25 1
Perform a two-tailed test at the 5% level. 4
rype of test.
The sample size is 8. From tables, the critical value for a two-tailed test at I:Jl- 25.5, n= 8~ therefore r, ~ 1 6 I:d 2
J. State tht: rcicction
cnrcnon. the 5% level is 0.7067 (page 652, row n ~ 8, column 0.025). n(n 2 -1)
6 X 25.5
H 0 is rejected if I r I> 0.7067. ~1
8 x63
4. CalculCJte r. For the data, r ~ 0.868. = 0.696 (3 s.f.)
S. l\hkc umclusion. Since I r I> 0.7067, H 0 is rejected in favour of H 1 • There is evidence of (b) Ho: P, ~ 0 (there is no correlation)
correlation between the weight of a ewe and its lamb. H,: P, * 0 (there is evidence of correlation)
It is unlikely that the data came from a population with correlation Perform a two-tailed test at 5% level.
coefficient p ~ 0. From tables (page 651), critical value is 0 7381 (n- 8 1
R · . · - , co umn 0.025)
Note that the conclusion would have been the same if you had chosen to carry out a one· eject Ho If I r, I> 0. 7381. ·
tailed test. In this case H 1 is p > 0, the critical value of r is 0.6215 and H 0 is rejected since Since r ~ 0.696 < 0 7381 d .
r > 0.6215. (c) p f ' . . . ' o not reJect Ho. There is no evidence of correlation.
er ormance m the examination does not fl
re ect on performance in coursework.
il',l' ::(,[! !Utf\i'" 613

(b) Stating dearly your hypotheses and .


5o/co t wo_-tai'I ed test, mterpret
. usmg a (You may use :Ex 2 = 18 672 .Ly 2 , 46 626,
Miscellaneous exercise 13c your rank
Lxy ~ 28 234) '
correlatwn coefficient.
(c) Perform a significance test, at the 5% level, (c) Comment on your result.
1 To test the belief that milder winters are Many sets of data include tied ranks
followed by warmer summers, meteorological
to determine whether there is any
(c) E~plain briefly how tied ranks ~an b d l
(d) The teac?er decide~ that, on the basis of the
association between the results of the two With. e eat scatter diagram, children F, J and K
records are obtained for a random sample of 10 (L)
surveys. Explain what your conclusion performed differently from the rest of the
years. For each year the mean temperatures are
means in practical terms. 5. At an agricultural show ten Shetland h . group. Suggest why the teacher might have
found for January and July. The data, in degrees
Celsius, are given below.
(d) Would it be more appropriate, less ranked by a qualified J"udge and b s ~ep weie come to that decision.
appropriate or equally appropriate to use · d Th . Y a tramee (e) !he t~acher decides to analyse the data
JU ge. eJr rankings are shown in the table.
the product moment correlation coefficient Ignonng these three children. Calculate to
Jan July
to analyse these data? Briefly explain why. three de.cimal places, the Spearman ranlc
16.2 (MEI) co~relatwn coefficient between the other ten
8.3
7.1 13.1 pmrs of observations.
16.7
3. The data below shows the height above sea level, (f) Using a 5% level_ ~f significance and quoting
9.0 x metres, and the temperature, y oc, at from tables o_f cnttcal values interpret the
1.8 11.2 7.00 a.m., on the same day in summer at rank correlatiOn coefficient. Use a one-tailed
3.5 14.9 nine places in Europe. test.
4.7 15.1 State clearly the null and alternative
5.8 17.7 Height, x Temperature, y hypotheses. (L)
6.0 17.3
1400 6 7. Jhe yield (per hectare) of a crop, c, is believed to
2.7 12.3
400 15 epend on the May rainfall, m. For 9 regions
2.1 13.4 records are are kept of the average values of c
280 18
and m, and these are recorded below.
(a) Rank the data and calculate Spearman's 790 10
rank correlation coefficient. 16
390 c m
(b) Test, at the 2.5% level of significance, the Calculate a rank correlation coefficient for th
belief that milder winters are followed by 590 14 data. ese
8.3 14.7
warmer summers. State dearly the null and 540 13 f.!si~g. one of the tables provided and a 5%
alternative hypotheses under test. 10.1 10.4
1250 7 sigmficance level, state your conclusions as to
(c) Would it be more appropriate, less whether there is some degree of agreement 15.2 18.8
680 13
appropriate or equally appropriate to use between the two sets of ranks. 6.4 13.1
(L)
the product-moment correlation coefficient
(a) Plot these data on a scatter diagram. 11.8 14.9
to analyse these data? Briefly explain why. 6. A teacher recorded the following data which
(MEI) (b) Calculate the product-moment correlation 12.2 13.8
coefficient between x andy. ref~r to the marks gained by 13 children in an
aptitude test and a statistics examination. 13.4 16.8
2. Bird abundance may be assessed in several ways. (Use Lx 2 ~ 5 639 200. Ly 2 ~ 1524,
11.9 11.8
In one long-term study in a nature reserve, two Lxy ~ 66 450) Aptitude
(c) Give an interpretation of your coefficient. Statistics 9.9 12.2
independent surveys (A and B) are carried out. Child
On the same day the number of hours of Test, x Examination, y
The data show the number of wren territories
recorded (Survey A) and the numbers of adult sunshine was record-:.:d and Spearman's rank (Lc ,~ 99.2, Lm ~ 126.5, Lc' ~ 1150.16,
A 54 84
wrens trapped in a fine mesh net (survey B) over correlation between hours of sunshine and Lm ~ 1832.07, Lmc ~ 1427.15)
temperature, based on Ld 2 "" 28 was 0.767. B 52 68
a number of years. (a) Find t~e.eq~wtion of the appropriate
(d) Stating dearly your hypotheses and using a c 42 71 regresswn hoe.
Survey B 5% two-tailed test, interpret this rank
Survey A D 31 37 (b) Find r, ~he linear (product-moment)
correlation coefficient. (L)
E 43 79 correlatwn coefficient between c and m
11
16
F (c) In a tenth reg~on the average May rainf~ll
19 12 4. At the end of a season a league of eight ice 23 58 was 14.6. Estimate the average yield of the
15 hockey clubs produced the following table G 32 33
27 crop for that region, giving your answer
showing the position of each club in the league
50 18 H 49 60 correct to one decimal place.
and the average attendances (in hundreds) at
60 22 I 37 47 (d) Calcula~e the value of rs, Spearman's rank
home matches. correlation _coefficient, for the above data
70 35 J 13 60
H and determme whether it is significantly
79 35 Club A B c D E F G K 13 44 greater ~han zero at the 5% level.
79 71 L 36 {e) State, With a reason, which of rand r you
2 3 4 5 6 7 8 64
. 84 46 Position 1 degard as being more appropriate for sthese
M 39 49
85 53 27 34 26 22 32 m. (C)
Average 37 38 19
97 52 (a) Draw a scatter diagram to represent these
two sets of marks. 8. In a random sample of 8 areas, residents were
(a) Calculate the Spearman rank correlation
(a) Plot a scatter diagram to compare results for coefficient between position in the league (b) Calculate, to three decimal places, the a~ked t~ express .their approval or disapproval of
the two surveys. product-moment correlation coefficient t e servtces provided by the local authority A
and average home attendance.
(b) Calculate Spearman's coefficient of rank between the test mark and the examination score of 0 represented complete dissatisfaction
correlation. made. and 10 represented complete satisfaction. The'
''~

table below shows the mean score for each local


In 1924 there were 249 male and 177 female It is ~ypothe~ised that a high value of x will be
It appears th~t region A is exceptional. What
authority together with the authority's level of births. associated With . a low value of y · Ca IcuIate a
r~n I~ ~orre Iatwn coefficient and test . wo~ld your fmdings be if this region were
community charge. (e) Without carrying out any further sJgmficance. rts onutted from the analysis?
calculations state, giving a reason, what
Community Approval effect the inclusion of these figures would
Authority Charge(£) rating have on the value of the product moment
correlation coefficient. (L)
A 485 3.0
B 490 4.4 10. Seven rock samples taken from a particular Mixed test 13A Correlation Coefficients
locality were analysed. The percentages, C and
c 378 5.0 M, of two oxides contained in each sample were 1. The bivariate sample illustrated in th .
D 451 4.6 recorded. The results are shown in the table. a random sample of 20 students · e scatter diagram below shows the heights, x em• and mass es,y kg,o f
E 384 4.1
F 352 5.5 Sample c M 80

G 420 5.8 1 0.60 1.06


H 212 6.1 2 0.42 0.72
3 0.51 0.94
Calculate Spearman's rank correlation coefficient
for these data. 4 0.56 1.04 70
Carry out a significance test at the 5% level 5 0.31 0.84
using the value of the correlation coefficient 6 1.04 1.16
which you have calculated. State carefully the
7 0.80 1.24
null and alternative hypotheses under test and
the conclusion to be drawn. (MEl)
Given that
9. A local historian was studying the number of LCM ~ 4.459, LC 2 ~ 2.9278, LM'~ 7.196,
births in a town and found the following figures
relating to the years 1925 to 1934. find, to three decimal places, the product-
moment correlation coefficient of the percentages
Male births, x Female births, y of the two oxides. Calculate also, to three
decimal places, a rank correlation coefficient. 50
223 219 Using tables state any conclusions which you
218 205 draw from the value of your rank correlation
223 209 coefficient. State clearly the null hypothesis being
tested. (L)
223 239
242 252 11. In the table below, xis the average weekly
278 256 household income in £ andy the infant mortality
190 200
254 per 1000 live births in 11 regions of the UK in Height (x em)
299
257 1985.
256 You are given that I:x ~ 3358, I:x2"" 567190
255 259 Region X y Ly ~ 1225, Ly' ~ 76 357, Lxy ~ 206 680. , Skater Judge 1 Judge 2
292 323 (a) Calc~l~te the product-moment correlatio A 5.3
A 170.4 8.4 5.4
coeffrcient. n
B
9.4 4.9 5.0
(a) Draw a scatter diagram to illustrate this B 183.2 {b) Carry out a hypothesis test at the 5"' I I
information. f · ·r ' to eve c 5.6 5.8
The historian calculated the following summary
c 172.9 10.3 o sig~r IC~nce, to determine whether or not
D 5.2
D 187.1 10.5 ~here ~s. evldence that the height of a student 5.6
statistics from the data: IS positively correlated with his or her m
E 5.7 5.2
What ~eature. of the scatter diagram sugg::~~
E 203.2 8.3 F
S",~8276.9, s,.,~10230.1, s,,~7206.3. 4.8 4.5
F 204.8 9.4 that thrs test rs appropriate? G 5.2
(b) Calculate the product-moment correlation 4.7
G 208.8 8.5 (c) A statis~ics student suggests that a positive H 4.6 4.8
coefficient. corr~latwn between height and mass implies
The historian believed these data gave strong H 248.0 9.0 I 5.1 5.3
that ~he taper a student is the heavier he or
evidence of a positive correlation between male I 198.3 9.4
s~e will be · Comment on this statement 1 4.9 4.9
and female births. 187.1 9.8 Wtth reference to your conclusions in
J (a) Calcula~e the value of Spearman's rank
(c) Stating your hypotheses clearly, test at the K 179.1 9.6 part (b). (MEI)
corn:Iatwn coefficient for the marks of th
1% level of significance whether or not two judges. e
there is evidence to support the historian's 2 · rwo judges give marks for artistic impression
(b) Use your answer to (a) to test at the 5'1::0
belief. out of a maximum 6.0) to 10 ice skaters. ·Ievelof · Trcance, w1ether
I ' appears that
(d) State an assumption required for the validity h . srgm it
~ ere IS some overall agreement between the
of the test in part (c) and comment on
Judgcis. ~tate your hypotheses and your
whether or not you consider it to be met. cone uswns carefully.
....----------------- .';! l ·, '- i )(;

A s chologist was studying the rel~t_ion_ship


For these marks the product-moment . 4. p y hart term memory and abthty m
(c) correlation coefficient is 0.6705. Us~ this to
test at the 5% level, whether there ts any
between s. le of 8 students was shown
mathemrtb~· ~s sf:~ seconds and the students ICT STATISTICS SUPPLEMENT
cor;elation between assessments of the two a tray okod ~~c recall as many of the objects as
were as e f b. t called
Judges. · t ld The number o o Jec s re .
(d) Comment on w h lC
. h is the more appropna e
(MEl) t hey cou . d d and compared with thetr
. tly was recor e .
test to use in this situatiOn. :~~~c( ercentage) in a recent ~athemattcs
examiJation. The results are gtven below.
3. It is hypothesised that there is a .rositte .
Contents
corre1a t'ton between the populatton· o a country
d Student AB CDEFGH
an d tts
. area. The following table. giVes a ran
· om
1 f 13 countries with thetr area x, m Using a Spreadsheet
~h:Js:nds of square kilometres, and population
No. of
objects 3 512871149 Using Internet Resources and Word
617
y, in millions. 620
Maths% 56 64 75 69 48 63 52 84 Using Autograph
Country X y 622
Ch.1 Representation and summary of data
2.5 0.5 (a) Calculate Spearman's rank correlation 625
1 coefficient for these data. . . Discrete/continuous, frequency density
2 28 5 (b) Using a 5% level of signi~tcance, and statmg
627
Ch.2 Correlation and regression
3 30 2 your hypotheses clearly, mterpret your 630
4 Ch.3 Probability
4 72 result. 632
42 Give a reason wb.y it may be more Ch.4, 5 Discrete random variables
5 98 (c) a ro riate to use Spearman's rank . 635
6 121 21 PP P fficient for the hypothests test Ch.6, 7, 8 The normal distribution
1
corre atwn coe l · 637
7 128 16 than the product~moment corre atton (L) Ch.9 Sampling and estimation
176 3 coefficient. 640
8 Ch.10, 11 Hypothesis tests
9 239 14 642
10 313 37
11 407 6
12 435 17 INTRODUCTION
13 538 22
The intention of this supplement is to explore the use of ICT in the teaching and learning of
Plot a scatter diagram and c?mment on tts statistics and probability. It is well established now that dynamic and interactive computer
implication for the hypothesis. ff . t d
Calculate a suitable correlation coe Jcten an (C) images can bring subjects to life in a way that was impossible to imagine before. The principle
test its significance at the 5% level. benefit is that lessons can now have variety. The same topic can be presented in the traditional
way on the board, explored in a practical simulation, investigated using a spreadsheet or
illustrated using a graph plotter. There is also quite likely to be a useful JAVA applet or some
interesting real data on it from the internet.

Furthermore students can now present their findings electronically, and teachers can store,
share and continually refine their lesson plans. And if you add to all this the obvious benefit of
the computer's ability to carry out calculations without effort, ICT methods are almost
guaranteed to enhance the enjoyment of those teaching and studying this subject.

USING A SPREADSHEET
A spreadsheet, such as Excel, has enormous power; it can effortlessly analyse huge data sets,
conduct simple simulations. Getting familiar with all this can come only with practice, and
there are plenty of spreadsheet tutorials around, both in print and on the net. For starters,
here is a summary of some of the more important features that relate to statistics, and in each
of the following pages, features that relate to specific topics are listed.
618 1
I
619

Entering formulae and functions To control format conditionally:


Format =::}conditional formatting
. k h '-' button
Enter '~'' or chc
on t e - . d h 1 ft of the cross and tick to find the
e.g. if~ 0 then put a frame round it.

F:::~:~ ::~;;e :~ :~ ~lyou ~~=f~,H~~; ~:C~etails


h · wmserte tote e h' ugeta
Ufse th1e know it, then clickdo: of
ormu a Y ' l . . hat each parameter 1s, an The tools menu
helpful entry box, exp ammg w d the chapter headings.
appropriate formulae are given un er
To solve equations:

lllllllllllllj or~
Tools '""'solver: this finds the value of a cell that makes the target cell a max, a min a
~ ~
value, e.g. to solve xA 3 - 2 0, set A 1 1 and Bl ~AI 3 - 2. Select Bl then Tools- Solver.
A

Bl is the 'target cell', "Equal to Value of" (0), by changing cell Al. Solve!
To show formulae instead of results:
Tools =::}options==> View'=} Window options
TICK 'show formulas' (or use Ctrl- ')
To hide the grid lines:
Tools =}options = } View=::} Window options
UNTICK 'show gridlines'
To ensure auto-recalculation on pressing F9:
Tools =}options =} Calculations =}Automatic
For iterative solving, e.g. x ~ g(x):
Tools -options -Iteration
-Max iterations (100)- Max change 0.001
e.g. EnterAl ~ 1
Select A1
Enter~ COS( AI)
Press lf2J
For a cell formula referring to itself without iteration:
Tools =} options=} Iteration
Array . a sprea
. ous selection of cells m . d sh ee1. Arrays are referred to by -Max iterations (1) (leave Max change)
This word is used for a conhtmu d. tes e g. A2:B16 (e.g. Dl ~ Dl + 1 to increment by 1)
the top-left and bottom-ng t coor ma , .
To draw a histogram (applicable only to equal classes):
Tools -Data Analysis -Histogram
The edit menu (see Chapter 1)
To insert a data set from some other ~,ource it is sometimes useful to try
For random number generation (see Chapter 9):
Edit '""'Paste Special'""' select "text 1 ·n a cell and highlight it. Tools -Data Analysis
. e.g. 1, 2, 3 ' ... 1000: First enter the start va ue '
To generate a senes~ =}Random Number generation
Edit '""'Fill'""' Senes Continuous Uniform (a, b), normal (m, s), Bernouilli (p), binomial (n, p), Poisson(m),
Discrete, in 2 columns: x, P(X ~ x)

The format menu For sampling n items from an array:


Tools -Data Analysis- Sampling
To control the appearance of the numNbers~ tab'""' Number'""' Decimal places Fort-tests:
Forrnat '""'cells'""' Format Cells'""' urn er Tools -Data Analysis
To centro1 the appearance of .a selected array: - t-test (paired 2 samples for means)
Format '""'column~ autoflt 1 ) 2-sample assuming equal variances
Format '""'autoformat (fancy styes 2-sample assuming unequal variances
T
I
For the forms toolbar: http:/I lottery.merseyworld.com!
Tools ::::::} Customize=::} Forms The UK Lottery Web Site - includes statistics from all tbe draws.
Label, Group-box, check-box, option button, list-box, combo-box, scroll-bar, spinner
http:/lwww.fa-premier.com/results!
UK Premier Football results and statistics

The data menu ~t~ :II ~wu;.stats. ox.ac. uk!links!schoenfield. htm


c oen Ield s LISt of Data Archives (Oxford University)
Click first on the top left hand corner of the data.
~;:!I sunsite.uncedu! lunar bin/ worldpop
Sorting and filtering: mographlC statistics (mcluding up to th e mmute
. world population)
Data ~sort (up to 3 columns deep)
Data ~filter~ auto-filter http:/lwww.nist.gov/itl!div898/ strd!
US Statistics Reference Database
This is useful if a dataset has pasted incorrectly into a single column. Converting text into
columns: http:/lwww.statistics.gov.uk
Data ~Text to Columns UK National Statistics

For re-interpreting multiple data sets: http://www.un.org/Pubs!CyberScho 0 lB . . .


Data from the UN by us/mfonatton/e_mfonation htm
Data ~Pivot Table report , country ·

Charts Teaching statistics


Excel was never written as an educational tool, so it is worth getting to know the charts http:!lwww.rss.org.uk
option to sort out what is going to be useful and what is purely decorative. Of the chart Royal Statistical Society
options available 'Column', 'Bar', 'Line', and 'XY (Scatter)' all work well, but Excel is not
good at Histograms, or Time Series Moving Averages. http :II science.ntu.ac. uk!rsscse/TS!
Teachmg Statistics magazine- horne page

http:/lwww.stats.gla.ac.uk:80! ctil
USING INTERNET RESOURCES AND WORD CTI Statistics (changing to LTSN-CMSOR)

There is an ever-growing amount of useful information on the net for the study of statistics http:/lwww.kuleuven.ac. be!ucs!java!index.htm .
and probability. Mixed in with all that is the less useful, and the task of sifting out quality JAVA StatiStiCs- some fascinating 'applets' from Belgium
resources is getting harder. Listed here are some authenticated sites, but the pace of change
being what it is there is no guarantee that they will still be there and still be useful by the time
~~~::I{ur~stati newcastle. edu.au!surfstat!main/surfs tat. html
a Ia on me text. An mtroductory course by Annett D b I
this is read! http·!/ o son eta ., Newcastle University
· cast.massey.ac.nz
CAST: Computer Assisted Statistics Teachin [r . . .
Doug Stirling, Massey University Palmersto g NegishtraNtwn reqmred] -a complete course, by
Data sets ' n ort , ew Zealand
http:/119 3.61.1 07. 61/volume
http:!!lib.stat.cmu.edu!DASLI DISCUSS statistics teaching resources
DASL: The Data and Story Library (USA), categorised by topic. An Important contribution to the understandin of ..
Coventry University. g probal)!hty and statistics from the team at
http:!!www.maths.uq.edu.au/ -gks! data
OzDASL (University of Queensland, Australia). Australian version of the above. http·//www ·
p . .mzs.coventry.ac.uk! -styrrell!resource.htm
http :II forum.swarthmore. edu! workshops! sum96/data. collections!datalibrary! ersonal selectiOn of statistical web resources from .
co-author of DISCUSS Sidney Tyrrell, Coventry University,
The Data Library (from the Math Forum, USA)
http:!!www.ni.com.au!mercury!mathguys/mercindx.htm
Chance and Data (from Tasmania, Australia)
1
I
A bivariate data set can b d .
Getting all this into word . e create m various ways:
(a) By addmg 'cursors' perh .
( I , aps m a pattern th 'll h
Any text or graphics can be copied straight into Word. Simply ~;~e~ost y i~ a well-correlated line, but wi~~ : ' el~make a particular teaching point
aroun at Will subsequently. ne out Ier). Cursors can of course be
1. Mark any text you want, or hover over any graphic you want.
Use the right-click options: 'Select a II cursors' and 'C
2. Right-click 'Copy' (or Ctrl-C) This will change th . onvert to data set'.
. d' 'd e cursors mto a singled b'
3. Click where you want to insert it in Word m lVI ual cursor around if you hold down a~~r~ Ject, though you can still move an

4. Right-dick 'Paste' (or Ctrl-V) you can double-click on any one curs . .
box. or m the data set to open the 'Ed.It d ata set , dialogue
.
It is often better to paste into a text box, so you have 1nore control over the layout and
positioning. Note: any internet links (underlined in blue) will be copied too.

Copyi a data set: i

If data is presented on the web page in columns, it should copy and paste in TAB-separated ,I I

(.tsv) or COMMA-separated (.csv) format.


Pasting into Word should therefore also put it in columns. You may need to adjust the TABS
settings to suite the data once it is in.
Copying into EXCEL can be less successful, with all the data often ending up in the first
column. If this happens, use the 'text to columns' feature in the 'DATA' menu, and try 'Tab',
··.scale>-~ l
'Comma' or 'Space' until it works. Alternatively try S~ale-yJ
Edit -Paste special and select 'Text'.

USING AUTOGRAPH
Autograph is a dynamic graphing package that operates in both bivariate and single variable
modes. In the bivariate mode, as well as a full range of equations and coordinate geometry
operations, data sets can be represented as scatter diagrams. In the single-variable mode, data
can be displayed in all the usual diagrams, and probability distributions can be drawn. A
variety of on-screen calculations are available.
Many of these operations can also be created very effectively on a spreadsheet, and
throughout this supplement both approaches are explored.
I OK Cancel
Help .. 1
(b) By using the Edit Data Set dialogu I . h . . .. ·. .
Imported by loading a CSV file ( e Jox. ere data can be entered directly in .
spreadsheet. comma-separated), or pasted in from t wo coPIumns
atrs, m
. a
Bivariate data
~ata can then be sorted (by x or by y) scaled b
In Autograph, the word 'cursor' is used to describe a coordinate point that is added by the
h~w Statistics' to create a dynamically linl d y an~ formula, or swapped over. Tick
user, either by 'point and dick' or entering coordinates directly. are ragged around (while holding down C~~). set o results, which change if any points
Most operations are available on the button bar, or through the right-click menu. This is
dependent on the selection of objects that has been made, and standard rules for object
selection are used.
Single variable statistics CHAPTER 1 REPRESENTATION AND SUMMARY OF DATA
Data sets come in all shapes and sizes these days. Computers can make light work of
presenting
job. data in a digestible form, but users need to take care to use the right tool for the

Using a spreadsheet
The following functions are relevant:
Lx SUM( array)
Lfx
SUMPRODUCT(array, array)
Lfx 2
SUMPRODUCT(array, array, array)
m = (l/n) Ix AVERAGE(array)
(1/n) I(x 2 ) - m 2STDEV(array)
n
COUNT(array)
COUNT!F(array, test)
kth smallest SMALL( array, k)
ktb largest LARGE( array, k)
Minimum MIN(array)
Maximum MAX(array)
Mode MODE(array)
Median MEDIAN(array)
Quartiles QUARTILE(array, q)
q = o- Min, q = 1: LQ, q = 2: Median,
q=3: UQ, q=4: Max

( = FREQUENCY(array1,array2)}
To get this to work you need:
(a) array1 with all the data in a single column

(b) array2 called the 'bin' array, listing tbe right hand ends of the classes, e.g. 20, 40, 60, 80
'samPle_-Siz_e>··_N:: ]10~ (c) array3 (empty) marked where you want the frequencies to go.
Sel8Ct Dlstrib: I Edit DistriQ.
J This operation then returns array of frequencies for <: 20, <: 40, <: 60, <: 80 and also> 80, but
f~ Wn_pefin·ed you need to have marked an array first ready to receive this information. (Note, it is one more
cell than the 'bin' array-2). This last cell is optional.
lmportCSV Memory I Recall
I NOTE: this formula is generating an array. Excel requires that you press SHIFT-CTRL-ENTER
when you have finished editing the formula: this puts curly brackets round the formula.
E·xportCSV Sortb_oi)o(
I ClearD'ata
I
I OK l Cancel J Help a

. in the values, ,l mp orted or pasted from


A raw data set can be entere t["c
spreadsheet, or created by samp mg rmn
i
d d' tl b typing
y a probability distribution.
626

-1-0 draw a histooram in Excel _h _ e Box and whisker diagram


I , , o . b b t n also run wlt out lt. " Dot plot
This is related to the frequencies functiOn a ove, u ca
.
This is not really a histogram . o nly works if the classes are even.
as It
D Analysis- Histogram l )
ITools R =}
nge-ata
raw data array (w hich can run over several co umns
30 "

nput a - f u er class interval limits


-----------
- - -.- - - ·-
' - - -; -

Bm Range~ array o pp 1 the resulting frequency column


Output range ~ array to p ace 20

TICK 'Chart Output' to draw the histogram 'F mat Data Series'- Options- Set Gap
Double-click on the sbaded histogram section or 10

~idth to zero:~·------------------~~~--~~------------------~
10 20 30 40 50 so 70 80
100 90 1 0

()' 80 The dot plot is useful for showing where the raw data points actually are, especially when
c 60 drawn at the same time as another diagram, e.g. a box and whisker or a histogram.
(])
::l
cr 40 • Numerical statistics can be generated as text:
(])
~

u.. 20
0 (a) summary statistics (mean, mode, quartiles, SD, range, etc, for raw data and for grouped)
0~ co~ "\~ '0~ qp ~0 (b) tabulated results, including mid-interval value
,~
"'~
'?)~
"'~ ~0
(c) stem and leaf diagram (really only works for discrete integer data)
An example of a stem and leaf diagram, generated as text in the 'Results Box'.
0:
Using Autograph d
10: 1 3 5 5 6
~ith a groupe d d ata set entered (with or without underlying raw data) you can raw: 20: 2 2 4 4 4 5 5 5 5 6 6 9
9 9 9
30: 1 1 1 1 1 2 4 4 4 4 4 4
., Histogram (see next page) . 4 4 4 4 4 4 5 5 6
40: 0 0 1 2 4 5 5 6 6 8 8 8 9 9 9 9 9 9
. (frequency or percentile scale) 8 8
• Cumulative frequency d ~agram 50: 0 0 1 2 3 5
60:
100 f --- ' --I 70:
1
2
5 6 6
----,
80 80:
90: 2

Illustrating discrete and continuous data on a histogram


10 The x-axis on most statistical diagrams is a continuous scale. Therefore it is import that
discrete data is represented correctly.
10 20 30 40 50 60 70 80 90 100
The grouped data entry box in Autograph requires that the data is represented as continuous
On-screen measurements enable quartiles and the median to be measured or discrete. The 'unit' should be set~ 1 for integers, or 0.1 if data is 'to the nearest 0.1', etc.
1
·
. - 1) will be represented on t h e h tstogra m as a region
. from
20-30 Illustrating frequency and frequency density on a histogram
A discrete data item, e.g. 53 (umt -120-29 is represented by the regton 19.5-29.5, t.e.
52 5-53.5. Similarly a class mterva .
shi.fted to tbe left by 0.5 (half the umt). This can be a difficult concept to get across, and a visual approach can be very effective.

- -- Example: Consider a grouped data set entered into Autograph defined by these variable-width
- -- class intervals:
- -
6 0,20,30,40,45,50, 100
-- --

--
.. and the following associated frequencies:
4
0, 10, 130,90,55, 75
'

1- - - If you select to draw a histogram from this data, the dialogue box asks you to choose
,1-- 'frequency' or 'frequency density'

~J[~~----!---~---+,----~--~6; 20 40
Frequency:

Cumulative frequency, frequency polygon an d box and whisker diagrams are similarly

displaced. . d l late the mean and SD, and


In the table of values option, the mid-~ter~al value IS use to ca cu
will take account of the nature of the ata.

Continuous:

Mid-interval Class Frequency Cumulative


Class
Width (f) Frequency
Interval Value (x)

10.0 20 0 0
0-20
25.0 10 10 10
20-30
35.0 10 130 140
30-40
42.5 5 90 230
40-45 Here, a data set is clearly being mis-represented: the mode is wrong and there is undue weight
47.5 5 55 285 to the final class.
45-50
75.0 50 75 360
50-100 Frequency Density:

Lt~ 360 'Zfx ~ 16 863 'Zfx' ~ 874 031


Mean~ 46.84 SD ~ 15.29

Discrete:

Mid-interval Class Frequency Cumulative


Class
Value (x) Width (f) Frequency
Interval

9.5 20 0 0
0-19
24.5 10 10 10
20-29
34.5 10 130 140
30-39
42.0 5 90 230
40-44
47.0 5 55 285
45-49
360
50-99 74.5 50 75
When selecting frequency density, you need also to specify the 'per' unit. The default value ~1
so that tbe area under the histogram is a direct measurement of the frequency.
, _ 16 683 'Zfx 2 ~ 857 259
L(~360 'Z,x-. . 05 )SD~15.29(unchanged)
Mean= 46.34 ( = Contmuous mean- .
1
Example: Enter the following discrete raw data: Choose Type: linear, and
Options: Forecast forward/back
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 23, 24, 24, 25, 25, 27, 31 and draw a histogram using the
display equation, display R 2
unequal classes 0, 10, 40 (i.e. 0-9 and 10-39). There are nine items in each class, so plotting
'frequency' gives two equal frequencies. Here is the same data plotted as a frequency density.
I BRAIN SIZE - IQ I
10~--------------~~----------~~----------------------------~
9
8
7
6
5
4 900000
3
2 1111·111·' 800000
lliitlllllllalllilllllllli
ll.. lll.llll Ill
700000
10 20 600000
100 150 200
Y= 1244.3x + 770610
CHAPTER 2 CORRELATION AND REGRESSION IQ
R' = 0.1496

An example of a data set copied off an internet .


Using a spreadsheet data: to select to two for this plot press Ct l page I~to Excel. There were many columns of
' r as you se ect the second (non-adjacent) column.
The following functions are relevant to bivariate statistics:

PEARSON(array-y, array-x) Using Autograph


returns correlation coefficient, r [PMCC]
With a bivariate data set in place ' tb ere are a number of t'
CORREL(array-y, array-x) op Ions to h elp Illustrate
. its properties.
Spearman's rank correlation coefficient
FORECAST(x, array-y, array-x)
estimates y(x) given the 2 arrays
INTERCEPT(array-y, array-x)
returns c in y-on-x regression line y = mx + c
SLOPE(array-y, array-x) ~ 'm'
returns m in y-on-x regression line y = mx + c
DEVSQ(array-y, array-x)
returns R 2 , the sum of the squares of deviations from the sample mean.
There is also a sophisticated facility to create a full regression analysis, including residuals and
normal probability plots. Use: Tools~ Data Analysis~ Regression

The diagrarn chart option


If you select 'XY (Scatter)' and create the diagram, there are a number of useful options The statistics box (tick the option in the Edit D .
results, and these will change dynamically 'f ata S?t dialogue box) gives all the standard
available: and drag). I a pomt m the data set is altered (hold down Ctrl
Double-click on the either axis to reformat that axis
Double-click a data point to open tbe 'Format Data Series' dialogue box Ifthe'Jumor
· ' optiOn
· IS
· chosen (from View~ p £ .
Click on a data-point (they should all turn yellow), then right-click~ 'Add Trend Line' only the means and the 'line of best fit'. re erences), this box is simplified, and gives
,-, i i-\,/t:
632 CH.AYTE.R 3 PPOE3AE3!UTY 633

Example: Simulation of the sum of two dice:


- !-
I ...
1-
-

I
- ~ - --1
I

L
A B c
1
16 ~00 ~~ -~ =- -1 -- - I
- I 1The effect of_
J
2
DICE 1
DICE2
~ INT(6 *RAND())+ 1
~ INT(6 *RAND())+ 1
~·· I_
_1_ - -I - -
I 3 SUM ~ C1 + C2
~e~ i-- -\- \ _ 1 m~uy data 1- : _ 4
11 -~~~~t~~- -~--i- -- --r- 1 _ . -: ~- - -1 5 Score
'x'
2
'f'
~ CS + (B5 ~ $C$3)
1 I -
_:_-l~-1--!-~1- 6 Score 3

~--~~r----L._- --:--r-T- I
7 Score

- L~ =\=::-
---Hr~~J-=-=t 21 Score
12 Fill down, mark B4:(2) and plot
The logical statement in cell C5, '(B5 = $C$3)' equals 1 if TRUE, otherwise equals 0. This is a
simple way to add 1 to a total if a condition is met.
wo
1 10 w 30
However, a cell formula referring to itself is called a 'circular reference' and you need to set
. h. ch can be moved around by pressing Ctrl and the following to make it work in this instance:
Here a set of data has a rogue pornt (w '.d d line of best fit are also shown.
dragging with the mouse), and the centrO! an Tools -Options- Calculation tab- Tick Iteration
~Max iterations= 1 (leave Max change)
Then hold down F9 to run the simulation.

SUM of 2-DICE

50~~~~~
40 ~44'44'44'~

30~~~
I 20~~
-I I
I . ht
. . 1e o f least . . . illustrated ' relating to a variable stratg
Here the pnnctp . squares regresston ts bemg 10
line through the centrmd.
0
2 3 4 5 6 7 8 9 10 11 12
CHAPTER 3 PROBABILITY
To make the x-axis work properly in this chart, proceed as follows:
Using a spreadsheet for probability
Choose the 'Column' chart type and observe that it is plotting 'x' and 'f' against the row
There are various ways to create random numbers in Excel: number.
Random number 0 :-.;; x ~ 1 Click 'next'=} 'Chart Source Data'=} 'Series'.
RAND 0 ) Raudom integer 1 <: x <: 6
INT(6 *RAND()+
1
dom integer a<: x <: b 1 'f' is OK.
RANDBETWEEN(a, b) Ran T ls ~Add-Ins and tick 'Analysis Too
NOTE: If RAND BETWEEN does not work, go to oo Select 'x' (which is plotting on the wrong axis). Copy and paste its array into the Category X
axis labels slot, then click 'Remove' to take it off they-axis list.
Pack'.
···················----------------r----------··········~~~~~~~~ii~lf~~~Erf~S4iND5DIOCR\riR4NDOM~R!ML~~5
Internet resources for probability 2. Dice throwing
This is similar to the spreadsheet example above, but more automatic to use. Options include
(a) the sum of 2 dice
1. The DISCUSS site f C tr University covering many aspects (b) the difference between 2 dice
This is a growing set o f teac h'mg resources
. . rom oven Y '
(c) throwing one die
of school level probability and statlsUcs. (d) throwing n dice
. Iu d e one on Buffon's needle.
Simulations available me

CHAPTERS 4 AND 5 DISCRETE RANDOM VARIABLES


Using a spreadsheet

The following formulae are available for generating discrete random variables in Excel:
BINOMDIST(r, n, p, T)
e.g. T ~ 0: X~ B(10, 0.5)
" , d data site from Tasmania . . BINOMDIST(2, 10, 0.5, 0)- P(X ~ 2)
2. The vhance an . . r 1 babilitytheorytostonesmthe
This has an excellent probability sectwn whlch m cs pro e.g. T ~ 1: (cumulative)
newspapers. BINOMDIST(2, 10, 0.5, 1)- P(X.;; 2)
POISSON(x, m, T)
Chance and Data in the News e.g. T ~ 0: X~ Po(4)
POISSON(2, 4, 0)- P(X ~ 2)
e.g. T ~ 1: (cumulative) PO!SSON(2, 4, 1)- P(X <: 2)
lVlainlndex Example: To produce the distribution and the cumulative distribution for X~ Bin(10, p)
NameA2 'n'
~.':.
••
,; ·. !fi!l··· ••.
\iil Name B2 'p'
Formula in D2 ~ binomdist(x, n, p, OJ
or for cumulative: = binorndist(x, n, p, 1}
From the Autograph 'extras'
Note 'x' is the column heading Cl and
this can be used in the formula.
Enter C2 ~ 0
To create 0-10 in C2-C12,
use Edit- Fill- Series
Fill down D2-D12 (double-click on the
,,,c---------~ D2 cell dot)
3.18

3.17

3.18 0.30
315~
0.25
n 0.20
0.15
3.13
0.10
3.12
0.05
3.11
0.00
110 L"'-'-'-'~~"'-'-'-'~~~~- 0 1 2 3 4 5 6 7 8 9 10
Cf-!APTEfFi 6, / ~\i'JLJ S C.:ONTii\JUOUS DISTPil-3UT!O;\JS f\1·10 rHt: f'J0f\rv1.l\L DISTRiBUTION 637

-,-- T - - ;-
i i :
1.20
1.00
~(1 qoo o.,s)
0.80
0.60
0.40 om -

0.20
0.00
0 1 2 3 4 5 6 7 8 9 450
500
550

To put in a slider to control p: 0.3 p

Right-click over tool bar- Forms- Scroll Bar 1


Drag it into position. Right-click: Forma~ contro. ou need the dummy cell, B10: set the (8(~5, Q.1)
Unfortunately this slider only works :'~ 1 ~n~~~rs, so Y
slider to vary from 0-100, and set p-
) - - - - : - - U2- -
_ '- _ _Jan-cf _ -~ __
!Po(~.5)
----- -:---- --Q_l- ----- ---- - --
11+--<-- --:- - , __
Using Autograph l
. discrete
The followmg . pro b a b'l'ty
11
distributions are available in Autograph: II n
" Rectangular: X - R(a, b) r=a, ... b
Table of Values of Po(2.5): ft ~ 2.5, a 2 ~ 2.5
P(X ~ r) ~ 1/(b- a+ 1)
Mean, f' ~(a+ b)/2 r P(X~r) P(X<r) P(X ;;> r)
0
Variance, o 2 ~ (b- a)(b- a+ 2 )/ 12 0.08208 0.08208 1
1 0.2052 0.2873
r = 0, 1, 2, ... n 0.9179
., Binomial: X- B(n, P) 2 0.2565 0.5438 0.7127
P(X ~ r) ~ n Cr.p r.q n-r 3 0.2138 0.7576 0.4562
Mean,fl ~ np 4 0.1336 0.8912 0.2424
Variance, a 2 = npq 5 0.0668 0.958 0.1088
r ~ 0, 1, 2 ... 6 0.02783 0.9858
0 Poisson: X- Po(.ic) 0.04202
7 0.009941
P(X ~ r) ~ .ic'/r!£' 0.9958 0.01419
8 0.003106 0.9989
Mean,p~.ic 0.004247
Variance, a 2 =A
P(X ~ r + 1) ~ P(X ~ r).A/(r+ 1)
also the distribution Po(npq) ~ B(n, P) CHAPTERS 6, 7 AND 8 CONTINUOUS DISTRIBUTIONS AND THE
r = 1, 2, 3, ... NORMAL DISTRIBUTION
0 Geometric: X- G(p)
P(X ~ r) ~ q ,_!
·p
Using a spreadsheet
Mean, f' ~ 1/p
Variance, a 2 ~ qfP 2
The following normal distribution formulae are available in Excel:
P(X ~ r + 1) ~ P(X ~ r).q
NORMDIST(x, m, s, T)
., User defined: X-N(m,s 2 )
Mean '' ~ L r.P(x ~ r) With T ~ 0 this returns the value of the pdf
"
Variance, o 2 ~ L ' 2 ·P(X ~ r)- fl
2
With T ~ 1 this returns P(X < x)
638 .0 CONC!St: CC:lUi~S[ IN fl.--LJ_I./El~ ST,il.TISTICS

NORMINV(p, m, s) Using Autograph


For X- N(m, s')
this returns x such that P(X < x) = p
The following continuous probability distributions are available in Autograph:
NORMDIST(z)
For z _ N(O, 1), this returns P(Z < z) " Uuiform: X- U(a, b) a< x < b
where Z =(X- m)/s f(x) = 1/(b- a)
Mean, ,u =(a+ b)/2
NORMINV(p) Variance, u 2 =(a- b) 2/12
returns z such that P(Z < z) = P
• Normal: X- N(ft, u 2 )
STANDARDIZE (x, m, s) z = (x- ,u)/u f(x) = 1/(aY(2,-)).eAH z 2 )
• User defined:
returns z = (x- m)./s . 'b . f ula is standard deviation (s) and
. t he normal distn uuon orm
NOTE: the parameter use d m Mean,!' =I x.f(x) dx
not variance (s 2 ). Variance, u 2 =I x 2 .f(x) dx- ,u 2
Example: A continuous function f(x) = x 2 , -2 < x < 2

The important principle to appreciate is that the total area must= 1. Therefore the function to
b2 nm11~d';m" be plotted must be f(x) = kx 2 , where k =I x 2 dx over the range.
b3 named "s'' Autograph automatically converts any f(x) entry to k.f(x), and areas can be measured on-
b4 named "x"_ _ . . . . . . . . . . . . . . . i screen by entering limits. By dragging the limits around, it can be seen that the total area= 1,

-~t=2::E!!ill~J:~~
and so areas represent probability.
(x-m)ls)
.' bG rmm~d ·y· i f

z={y-m)J's =i
-- ________ 1
{y-m)/s) _/ f(x) =kx 2
/
0.9

......... P!X~xl =I NORMDIST{x,m,s,·J) "-:-----,-~ 0.8

P{X>x) =[ 0.641! ·1-bS


P{X<y) =I
.. 0.977 I NORMDIST{y,m,s;l) .
P{X>y) =I()Jl23! 'l-b10
i

2
Example: X- N(500, 100 2 ): here areas between limits and inverse calculations are possible.
The parameters ,u and a can also be varied dynamically.
roo - - - - - - -; - - - - -: - - - - -- - - - - ,- - - - - -

0110)4·-----
N(500, 1002 )

- - - ,L
/

'
/
640 A CONCISE COURSE IN A · EVEI- STATiSTICS
. ·'--

CHAPTm 9 SAMPLING AND ESTIMATION 64!

CHAPTER 9 SAMPLING AND ESTIMATION


Click 'OK'. Then to create a frequency chart first set up a 'bin array' as a column of figures
~
0-10, and use the formula ( FREQUENCY(data array, bin array] on a new array next to the
Using a spreadsheet . f bin array.
and draw aDon't forget SHIFT-CTRL-ENTER! Select the bin array and the frequency array
bar chart
Excel includes a feature which can g~nera.te a sample o f rand o m data from a number o
To create a random sample from this sample, use Tools~
Data Analysis~Sampling. The
d 1.stn.b utwns.
. The choice of distributiOns rs:
~
'Input range' is the data. Use 'Random' with n 5, and mark the output range. Unfortunately
there is no easy way to create many such samples.
Uniform (a, b) [equivalent to RANDBETWEEN(a, b)]
Normal (m, s)
After five such samples it is useful to compare the mean and SD of the sample means with ,
Bernouilli ( P) and ai'/n calculated from the original data set.
Binomial (n, P) 1

Poisson (m), X_ x)]


User-defined Discrete [2 cols: x, P( - Using Autograph

Use New Statistics Page~ Add Grouped Data~ Use RawData~ Edit Raw Data~ Select
Distribution. There is the option to create a set of random data from the following probability
distributions:

Rectangular (a, b)- discrete or continuous


Binomial (n, p)
Poisson(,\)
Geometric (p)
Normal (u, a 2 )
User defined continuous f(x)

['User defined discrete' is not yet implemented]


Use 'Edit Distrib.' to enter the parameters

data set from B(lO, 0.5), and take samples o f srze


· 5 from rt.
Example: To create a large sample d N b Generation. Leave
To achieve the above, use Tools~ Data'An a lyber of Random Numbers' ( ~ rows)
sis~Ran om urn er . b lan k.
'Number of Variables' ( ~ colur;ms) and ~:~the array you want to fill with ra~dom m Seed'
Instead click on 'Output Range ;nd dr~g V lue' and 'Number of trials'. Leave Rando d)
numbers. Choose 'Binomia~'' an en~er: :Ue set of random numbers again if reqmre .
blank (this is used for creatmg exact y t e sa
...........----------------------
642 ,6.. CONCiSE COURS[ if\i f\-U~\/EL STi\TiSTiCS

Enter the sample size N, and press 'Create Sample'. Click 'OK', then 'Suggest Intervals' Example: CONFIDENCE(a, s, n)
(amend if necessary), then click 'Continuous' or 'Discrete' as appropriate, then 'OK'. ~hls. ~eturns the confidence intervals for .
sigmflcance level= a. Unfortun ' , a sample size of n from a population .
With a data set in place, select 'histogram' then 'autoscale'. Then choose 'Sample Means'. So for 9 5% confidence ' a = 1 - a9t5el/yl,OOa . Is a measure of the probability outsidewthlth SD = s,I s.at a
e mterva
In the 'Edit Sample Means' dialogue box, enter tbe sample size (e.g. n = 5). You can then
e.g. CONFIDENCE(1- 95/100 2 5 50)- 0
~c ..
I = samp Ie mean± 0.69 at '95%
· '
- . 69
(a) take samples one at a time, in which case the actual samples are indicated on the diagram
together with their mean. Example: CHITEST(array-1, array-2)
(b) take many samples (e.g. 100), in which case a dot plot is created. array-1 =actual frequencies
array-2 =expected frequencies
The Central Limit Theorem is very effectively demonstrated with almost any parent returns tbe x2 calculation for tbe two arrays
population. This enables a set of actual data (fre .
an underlying probability distributio~~encJes) to be tested against frequencies calculated from

CHAPTERS 10 AND 11 HYPOTHESIS TESTS


Using Autograph
Using spreadsheets
Example: Hypothesis testing on discrete probability di.st 'b .
Using Excel, the following formulae are useful when investigating hypothesis tests: oz.s r --, _ n utwns

BINOMDIST(r, n, p, T). e.g. for X- B(10, 0.5) X - 8(25, 0.2)


T = 0: BINOMDIST(2, 10, 0.5, 0) = P(X = 2) "'
T= 1: BINOMDIST(2, 10, 0.5, 1) = P(X.; 2) P(X ?o 8) = 10.9%
CRITBINOM(n, p, test). e.g. for X- B(n.p)
0.15
P(X ?o 9) = 4.7%
This finds the smallest x such that P(X,;;; x);;, test
POISSON(x, m, T). e.g. for X- Po(4)
"' ! -_
- !_
T= 0: POISSON(2,4, 0) = P(X = 2)

~~~L+~~.~-1~~~~~~-~~-~~
T= 1: POISSON(2, 4, 1) = P(X,;;; 2)
""'
2
NORMDIST(x, m, s, T). e.g. for X- N(m, s )
T = 0 ~value of the pdf (for plotting the curve)
T= 1 ~P(X <x)
H. ere Ho IS· P = 0.2
under B(25, 0.2) and H is >
w

2
hmlts can be dragged up and down the x-~xi/ 0.2. If x;;, 9 Ho is rejected. The boundary
NORMINV(p, m, s). e.g. for X- N(m, s )
This returns the value x such that P(X < x) = p Example: Hypothesis testing on continuous probability distributions
NORMSDIST(z) "' "
returns probability P(Z ( z)
NORMSINV(p) 0.15 -- _- -~~-:_- _ Ho: N(23.68, 8.73)
returns z such that P(Z ( z) = p -- - -_ -_ - P(X$18.82) = 0.05 = 5%
STANDARDIZE (x, m, s) ~ z = (x- m)./s - __ TYPE 1 error

Example: A tough driving examiner claims to pass only 20% of his candidates. After 50 tests, "'
H1: N(18. 17, 4.16)
what is the smallest number of passes required to refute this claim at 5%? P(X~18.82) = 0.3Z5 = 37.5%
Answer: critbinom(25, 0.2, 0.05) = 2 TYPE 2 error
This means P(X,;;; 2) > 5%
whereas P(X ( 1) < 5%
[
644 11 CONCIS.
~ C0lJRS'-
~·J '··
IN '"-1 ~'Fl STAf!Si!CS
" ~J ~-

. . b b.li distribution to a data set.


Example: Flttlilg a pro a 1 ty .a!
. b· m 1 pmsson o
. r normal distribution to a Appendix
·jj f. d the best parameters to f lt a mo ,
Autograp h w1 m
data set. . . ( eters) then chose 'fit to data'·
l distnbutwn any param , . l
First draw a histogram, then a nor~a t be a good fit if the frequency denstty sea e

(unit ~ 1) is used - this way t e tota area


f
The probability distribution whill on app::'~ b~th diagrams~ 1.
~· .~ ~·~ ·~ .~
CUMUlATIVE BINOMIAl PROBABiliTIES
40 The tabulated value is P(X ( r) where X -B(n, p)
p~ 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
n~2 r~ 0 0.9025 0.8100 0.7225 0.6400 0.5625 0.4900 0.4225 0.3600 0.3025 0.2500
" L~--~ · ··~·- 1 0.9975 0.9900 0.9775 0.9600 0.9375 0.9100 0.8775 0.8400 0.7975 0.7500
2 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
n~3 r~O 0.8574 0.7290 0.6141 0.5120 0.4219 0.3430 0.2746 0.2160 0.1664 0.1250
1 0.9928 0.9720 0.9393 0.8960 0.8438 0.7840 0.7183 0.6480 0.5748 0.5000
2 0.9999 0.9990 0.9966 0.9920 0.9844 0.9730 0.9571 0.9360 0.9089 0.8750
3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
n~4 r~o 0.8145 0.6561 0.5220 0.4096 0.3164 0.2401 0.1785 0.1296 0.0915 0.0625
1 0.9860 0.9477 0.8905 0.8192 0.7383 0.6517 0.5630 0.4752 0.3910 0.3125
2 0.9995 0.9963 0.9880 0.9728 0.9492 0.9163 0.8735 0.8208 0.7585 0.6875
3 1.0000 0.9999 0.9995 0.9984 0.9961 0.9919 0.9850 0.9744 0.9590 0.9375
4 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
n~5 r~O 0.7738 0.5905 0.4437 0.3277 0.2373 0.1681 0.1160 0.0778 0.0503 0.0313
1 0.9774 0.9185 0.8352 0.7373 0.6328 0.5282 0.4284 0.3370 0.2562 0.1875
2 0.9988 0.9914 0.9734 0.9421 0.8965 0.8369 0.7648 0.6826 0.5931 0.5000
3 1.0000 0.9995 0.9978 0.9933 0.9844 0.9692 0.9460 0.9130 0.8688 0.8125
4 1.0000 0.9999 0.9997 0.9990 0.9976 0.9947 0.9898 0.9815 0.9688
5 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
n~6 r~O 0.7351 0.5314 0.3771 0.2621 0.1780 0.1176 0.0754 0.0467 0.0277 0.0156
1 0.9672 0.8857 0.7765 0.6554 0.5339 0.4202 0.3191 0.2333 0.1636 0.1094
2 0.9978 0.9842 0.9527 0.9011 0.8306 0.7443 0.6471 0.5443 0.4415 0.3438
3 0.9999 0.9987 0.9941 0.9830 0.9624 0.9295 0.8826 0.8208 0.7447 0.6563
4 1.0000 0.9999 0.9996 0.9984 0.9954 0.9891 0.9777 0.9590 0.9308 0.8906
5 1.0000 1.0000 0.9999 0.9998 0.9993 0.9982 0.9959 0.9917 0.9844
6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1:0000
n~7 r~O 0.6983 0.4783 0.3206 0.2097 0.1335 0.0824 0.0490 0.0280 0.0152 0.0078
1 0.9556 0.8503 0.7166 0.5767 0.4449 0.3294 0.2338 0.1586 0.1024 0.0625
2 0.9962 0.9743 0.9262 0.8520 0.7564 0.6471 0.5323 0.4199 0.3164 0.2266
3 0.9998 0.9973 0.9879 0.9667 0.9294 0.8740 0.8002 0.7102 0.6083 0.5000
4 1.0000 0.9998 0.9988 0.9953 0.9871 0.9712 0.9444 0.9037 0.8471 0.7734
5 1.0000 0.9999 0.9996 0.9987 0.9962 0.9910 0.9812 0.9643 0.9375
6 1.0000 1.0000 0.9999 0.9998 0.9994 0.9984 0.9963 0.9922
7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
n~8 r~o 0.6634 0.4305 0.2725 0.1678 0.1001 0.0576 0.0319 0.0168 0.0084 0.0039
1 0.9428 0.8131 0.6572 0.5033 0.3671 0.2553 0.1691 0.1064 0.0632 0.0352
2 0.9942 0.9619 0.8948 0.7969 0.6785 0.5518 0.4278 0.3154 0.2201 0.1445
3 0.9996 0.9950 0.9786 0.9437 0.8862 0.8059 0.7064 0.5941 0.4770 0.3633
4 1.0000 0.9996 0.9971 0.9896 0.9727 0.9420 0.8939 0.8263 0.7396 0.6367
5 1.0000 0.9998 0.9988 0.9958 0.9887 0.9747 0.9502 0.9115 0.8555
6 1.0000 0.9999 0.9996 0.9987 0.9964 0.9915 0.9819 0.9648
7 1.0000 1.0000 0.9999 0.9998 0.9993 0.9983 0.9961
8 1.0000 1.0000 1.0000 1.0000 1.0000
646 /\. COI'-JC!SE COUi6t·:_ iN .i\~L.E_'/CL S!ATiSTiCS

CUMULATIVE BINOMIAL PROBABILITIES CUMULATIVE POISSON PROBABILITIES


The tabulated value is P(X.;; r) where X -B(n, p)
The tabulated value is P(X.;; r) where X -Po(;l)
p~ 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 A.~ 0.2 0.4 0.5
n~9 r~ 0 0.6302 0.3874 0.2316 0.1342 0.0751 0.0404 0.0207 0.0101 0.0046 0.0020 0.6 0.8 1.0
r 0 0.8187 1.2 1.4 1.5
1 0.9288 0.7748 0.5995 0.4362 0.3003 0.1960 0.1211 0.0705 0.0385 0.0195 0.6703 0.6065
1 0.9825 0.5488 0.4493 0.3679
2 0.9916 0.9470 0.8591 0.7382 0.6007 0.4628 0.3373 0.2318 0.1495 0.0898 0.9384 0.9098 0.8781 0.3012 0.2466 0.2231
3 0.9994 0.9917 0.9661 0.9144 0.8343 0.7297 0.6089 0.4826 0.3614 0.2539 2 0.9989 0.9921 0.8088 0.7358 0.6626
0.9856 0.9769 0.9526 0.5918 0.5578
4 1.0000 0.9991 0.9944 0.9804 0.9511 0.9012 0.8283 0.7334 0.6214 0.5000 3 0.9999 0.9992 0.9197 0.8795
0.9982 0.9966 0.9909 0.8335 0.8088
5 0.9999 0.9994 0.9969 0.9900 0.9747 0.9464 0.9006 0.8342 0.7461 4 1.0000 0.9999 0.9810 0.9662
0.9998 0.9996 0.9986 0.9463 0.9344
6 1.0000 1.0000 0.9997 0.9987 0.9957 0.9888 0.9750 0.9502 0.9102 5 1.0000 0.9963 0.9923
7 1.0000 0.9999 0.9996 0.9986 0.9962 0.9909 0.9805 1.0000 1.0000 0.9998 0.9857 0.9814
6 0.9994 0.9985
8 1.0000 1.0000 0.9999 0.9997 0.9992 0.9980 0.9968 0.9955
7 1.0000 0.9999 0.9997
9 1.0000 1.0000 1.0000 1.0000 0.9994 0.9991
8 1.0000 1.0000 0.9999 0.9998
n~10 r~O 0.5987 0.3487 0.1969 0.1074 0.0563 0.0282 0.0135 0.0060 0.0025 0.0010
0.0233 1.0000 1.0000
1 0.9139 0.7361 0.5443 0.3758 0.2440 0.1493 0.0860 0.0464 0.0107
2 0.9885 0.9298 0.8202 0.6778 0.5256 0.3828 0.2616 0.1673 0.0996 0.0547
A.~
3 0.9990 0.9872 0.9500 0.8791 0.7759 0.6496 0.5138 0.3823 0.2660 0.1719 1.6 1.8 2.0
4 0.9999 0.9984 0.9901 0.9672 0.9219 0.8497 0.7515 0.6331 0.5044 0.3770 2.2 2.4 2.5
r 0 0.2019 2.6 2.8 3.0
5 1.0000 0.9999 0.9986 0.9936 0.9803 0.9527 0.9051 0.8338 0.7384 0.6230 0.1653 0.1353 0.1108
1 0.5249 0.0907 0.0821 0.0743
6 1.0000 0.9999 0.9991 0.9965 0.9894 0.9740 0.9452 0.8980 0.8281 0.4628 0.4060 0.3546 0.0608 0.0498
7 1.0000 0.9999 0.9996 0.9984 0.9952 0.9877 0.9726 0.9453 2 0.7834 0.3084 0.2873 0.2674
0.7306 0.6767 0.6227 0.2311 0.1991
8 1.0000 1.0000 0.9999 0.9995 0.9983 0.9955 0.9893 3 0.9212 0.5697 0.5438 0.5184
0.8913 0.8571 0.8194 0.4695 0.4232
9 1.0000 1.0000 0.9999 0.9997 0.9990 4 0.9763 0.7787 0.7576 0.7360
0.9636 0.9473 0.9275 0.6919 0.6472
10 1.0000 1.0000 1.0000 5 0.9940 0.9041 0.8912 0.8774
0.9896 0.9834 0.9751 0.8477 0.8153
n ~ 15 r~o 0.4633 0.2059 0.0874 0.0352 0.0134 0.0047 0.0016 0.0005 0.0001 0.0000 6 0.9987 0.9974 0.9643 0.9580 0.9510
0.9955 0.9925 0.9349 0.9161
1 0.8290 0.5490 0.3186 0.1671 0.0802 0.0353 0.0142 0.0052 0.0017 0.0005 7 0.9997 0.9994 0.9884 0.9858 0.9828
0.9989 0.9980 0.9756 0.9665
2 0.9638 0.8159 0.6042 0.3980 0.2361 0.1268 0.0617 0.0271 0.0107 0.0037 8 1.0000 0.9967 0.9958 0.9947
0.9999 0.9998 0.9995 0.9919 0.9881
3 0.9945 0.9444 0.8227 0.6482 0.4613 0.2969 0.1727 0.0905 0.0424 0.0176 9 0.9991 0.9989 0.9985
1.0000 1.0000 0.9999 0.9976 0.9962
4 0.9994 0.9873 0.9383 0.8358 0.6865 0.5155 0.3519 0.2173 0.1204 0.0592 10 0.9998 0.9997 0.9996
5 0.9999 0.9978 0.9832 0.9389 0.8516 0.7216 0.5643 0.4032 0.2608 0.1509 1.0000 0.9993 0.9989
11 1.0000 0.9999 0.9999
6 1.0000 0.9997 0.9964 0.9819 0.9434 0.8689 0.7548 0.6098 0.4522 0.3036 0.9998 0.9997
12 1.0000 1.0000 1.0000
7 1.0000 0.9994 0.9958 0.9827 0.9500 0.8868 0.7869 0.6535 0.5000 0.9999
8 0.9999 0.9992 0.9958 0.9848 0.9578 0.9050 0.8182 0.6964 1.0000
9 1.0000 0.9999 0.9992 0.9963 0.9876 0.9662 0.9231 0.8491
10 1.0000 0.9999 0.9993 0.9972 0.9907 0.9745 0.9408 ;!~
3.2
11 1.0000 0.9999 0.9995 0.9981 0.9937 0.9824 3.4 3.5 3.6 3.8 4.0 4.5
12 1.0000 0.9999 0.9997 0.9989 0.9963 r 0 0.0408 5.0 5.5
0.0334 0.0302 0.0273
13 1.0000 1.0000 0.9999 0.9995 1 0.1712 0.0224 0.0183 0.0111
0.1468 0.1359 0.1257 0.0067 0.0041
14 1.0000 1.0000 2 0.3799 0.1074 0.0916 0.0611
0.3397 0.3208 0.3027 0.0404 0.0266
n~20 r~ 0 0.3585 0.1216 0.0388 0.0115 0.0032 0.0008 0.0002 0.0000 0.0000 0.0000 3 0.6025 0.2689 0.2381 0.1736
0.5584 0.5366 0.5152 0.1247 0.0884
1 0.7358 0.3917 0.1756 0.0692 0.0243 0.0076 0.0021 0.0005 0.0001 0.0000 4 0.7806 0.4735 0.4335 0.3423
0.7442 0.7254 0.7064 0.2650 0.2017
2 0.9245 0.6769 0.4049 0.2061 0.0913 0.0355 0.0121 0.0036 0.0009 0.0002 5 0.8946 0.6678 0.6288 0.5321
0.8705 0.8576 0.8441 0.4405 0.3575
3 0.9841 0.8670 0.6477 0.4114 0.2252 0.1071 0.0444 0.0160 0.0049 0.0013 6 0.9554 0.8156 0.7851 0.7029
0.0059 0.9421 0.9347 0.9267 0.6160 0.5289
4 0.9974 0.9568 0.8298 0.6296 0.4148 0.2375 0.1182 0.0510 0.0189 7 0.9832 0.9091 0.8893
0.0207 0.9769 0.9733 0.8311 0.7622 0.6860
5 0.9997 0.9887 0.9327 0.8042 0.6172 0.4164 0.2454 0.1256 0.0553 8 0.9692 0.9599 0.9489
0.1299 0.0577 0.9943 0.9917 0.9901 0.9134 0.8666
6 1.0000 0.9976 0.9781 0.9133 0.7858 0.6080 0.4166 0.2500 9 0.9883 0.9840 0.8095
0.1316 0.9982 0.9973 0.9786 0.9597 0.9319
7 0.9996 0.9941 0.9679 0.8982 0.7723 0.6010 0.4159 0.2520 0.9967 0.9960 0.9942 0.8944
0.5956 0.4143 0.2517 10 0.9995 0.9992 0.9919 0.9829 0.9682
8 0.9999 0.9987 0.9900 0.9591 0.8867 0.7624 0.9990 0.9987 0.9981 0.9462
9 1.0000 0.9998 0.9974 0.9861 0.9520 0.8782 0.7553 0.5914 0.4119 11 0.9999 0.9998 0.9972 0.9933
0.9997 0.9996 0.9994 0.9863 0.9747
10 1.0000 0.9994 0.9961 0.9829 0.9468 0.8725 0.7507 0.5881 12 1.0000 0.9999 0.9991 0.9976
0.9999 0.9999 0.9998 0.9945 0.9890
11 0.9999 0.9991 0.9949 0.9804 0.9435 0.8692 0.7483 13 1.0000 0.9997 0.9992
0.8684 1.0000 1.0000 0.9980 0.9955
12 1.0000 0.9998 0.9987 0.9940 0.9790 0.9420 14 1.0000 0.9999 0.9997
0.9423 0.9993 0.9983
13 1.0000 0.9997 0.9985 0.9935 0.9786 15 1.0000 0.9999
0.9984 0.9936 0.9793 0.9998 0.9994
14 1.0000 0.9997 16 1.0000
15 1.0000 0.9997 0.9985 0.9941 0.9999 0.9998
0.9987 17
16 1.0000 0.9997 1.0000 0.9999
0.9998 18
17 1.0000 1.0000
18 1.0000
.L\.FPEHDiX 649

THE STANDARD NORMAL DISTRIBUTION FUNCTION


CUMULATIVE POISSON PROBABILITIES If Z has a normal distribution with mean 0 and variance 1 then, for each <J.>(z)
value of z, the table gives the value of <l>(z) where
b 1 t d alue is P(X.;; r) where X -Po(!.)
T heta uae v 10.0
9.0 9.5 <l>(z) = P(Z .;; z).
7.5 8.0 8.5
6.5 7.0 0.0000
!. 6.0
0.0002 0.0001 0.0001 For negative values of z use <1>(-z) = 1- <l>(z). 0 z
0.0009 0.0006 0.0003 0.0008 0.0005
r=O 0.0025 0.0015 0.0019 0.0012
0.0073 0.0047 0.0030 0.0042 0.0028 1 2 3 4 5 6 7 8 9
1 0.0174 0.0113 0.0093 0.0062
0.0203 0.0138 0.0149 0.0103 z 0 I 2 3 4 5 6 7 8 9 ADD
0.0430 0.0296 0.0212
2 0.0620 0.0424 0.0301
0.0818 0.0591 0.0403 0.0293 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
3 0.1512 0.1118 0.0744 0.0550
0.1321 0.0996 0.0671 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16
0.2237 0.1730 0.1157 0.0885 20 24 28 32 36
4 0.2851 0.1912 0.1496 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15
0.3007 0.2414 0.1649 0.1301 19 23 27 31 35
5 0.4457 0.3690 0.2562 0.2068 0.3 0.6179
0.3782 0.3134 0.2202 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.5265 0.4497 0.3239 0.2687
6 0.6063 0.4530 0.3856 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5987 0.5246 0.3918 0.3328
7 0.7440 0.6728 0.5231 0.4557
0.6620 0.5925 0.4579 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.7916 0.7291 0.5874 0.5218
8 0.8472 0.7166 0.6530 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.8305 0.7764 0.6453 0.5830
9 0.9161 0.8774 0.7634 0.7060 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8622 0.8159 0.6968
0.9332 0.9015 0.8030 0.7520 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
10 0.9574 0.8881 0.8487
0.9467 0.9208 0.8364 0.7916 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
11 0.9799 0.9661 0.9091 0.8758
0.9573 0.9362 0.8645
0.9840 0.9730 0.9261 0.8981 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
12 0.9912 0.9658 0.9486
0.9872 0.9784 0.9400 0.9165 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
13 0.9964 0.9929 0.9726 0.9585
0.9897 0.9827 0.9513 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 13 IS 17
0.9970 0.9943 0.9780 0.9665 11
14 0.9986 0.9918 0.9862 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162
0.9976 0.9954 0.9823 0.9730 0.9177 2 3 5 6 8 10 11 13 14
15 0.9995 0.9988 0.9934 0.9889 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
0.9990 0.9980 0.9963 0.9857 1 3 4 6 7 8 10 11 13
0.9996 0.9947 0.9911
16 0.9998 0.9984 0.9970
0.9996 0.9992 0.9957 0.9928 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 I 2 4 5 6 7 8 10 11
17 0.9999 0.9998 0.9987 0.9976
0.9997 0.9993 0.9965 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 I 2 3 4 5 6 7 8 9
0.9999 0.9999 0.9989 0.9980
18 1.0000 0.9997 0.9995 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.0000 0.9999 0.9991 0.9984
19 1.0000 0.9998 0.9996 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.0000 0.9999 0.9993
0.9998 0.9996 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
20 1.0000 0.9999
0.9999 0.9997
21 1.0000 0.9999 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
0.9999 0.9999
22 1.0000 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 I 1 2 2 2 3 3 4
1.0000 1.0000 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 I 2 2 2 3 3
23 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 2 2
I 2 2
24 2.4 0.9918 0.9920 0.9922 0.9924 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 I 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 I
2.6 0.9953 0.9955 0.9956 0.9957 0.9958 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 I I 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 I 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 I
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 .09985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0

CRITICAL VALUES FOR THE NORMAL DISTRIBUTION


The table gives the value of z such that P(Z <: z) = p, where Z- N(O, 1).

p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995


z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
CRITICAL VALUES FOR THE t-DISTRIBUTION
CRITICAL VALUES FOR THE 2
If T has a !-distribution with v degrees of freedom then, p
X DISTRIBUTION
for each pair of values of p and v, the table gives the If X has a X2 distribution with v d
value of t such that P(T ,;;: t) ~ p. for each pair of values of P d egrees of freedom, then
of x such that P(X;;, x) ~ P an v, the table gives the value

p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995 p 0.990 0.975 0.950 0.100
v~1 1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 v 1 0.050 0.025
0.000 0.001 0.010 0.005
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 2 0.004 2.705
0.020 0.051 3.841 5.024
3 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 3 0.103 4.605 6.635 7.879
O.l15 0.216 5.991 7.378
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 4 0.352 6.251 9.210 10.597
0.297 0.484 7.815 9.348
0.711 7.779 11.345 12.838
5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 5 0.554 9.488
0.831 1.145 11.143 13.277
6 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 9.236 11.070 14.860
6 0.872 12.832 15.086
7 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 1.237 1.635 16.750
7 1.239 10.645 12.592
8 0.706 2.306 2.896 3.355 3.833 4.501 5.041 1.690 2.167 14.449 16.812
1.397 1.860 8 12.017 14.067 18.548
1.646 2.180 16.013
9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 2.733 13.362 18.475 20.278
9 2.088 2.700 15.507 17.535
3.325 14.684 20.090 21.955
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 10 2.558 16.919
3.247 3.940 19.023 21.666
2.201 4.025 4.437 15.987 18.307 23.589
11 0.697 1.363 1.796 2.718 3.106 3.497 11 20.483
3.053 3.816 23.209 25.188
12 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 4.575 17.275
12 3.571 19.675
2.160 3.372 3.852 4.221 4.404 5.226 21.920 24.725
13 0.694 1.350 1.771 2.650 3.012 13 18.549 21.026 26.757
4.107 5.009 23.337
14 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 5.892 19.812 26.217 28.300
14 4.660 22.362
5.629 6.571 24.736 27.688
15 5.229 21.064 23.685 29.819
15 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 6.262 7.261 26.119
22.307 29.141 31.319
16 0.690 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 24.996 27.488
16 5.812 30.578 32.801
17 0.689 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965 6.908 7.962
17 6.408 23.542 26.296
18 0.688 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922 7.564 8.672 28.845 32.000
18 7.015 24.769 27.587 34.267
19 2.861 3.174 3.579 3.883 8.231 9.390 30.191 33.409
0.688 1.328 1.729 2.093 2.539 19 25.989 28.869 35.718
7.633 8.907 31.526
10.117 27.204 34.805 37.156
20 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850 20 8.260 30.144
9.591 10.851 32.852 36.191
3.819 28.412 31.410 38.582
21 0.686 1.323 1.721 2.080 2.518 2.831 3.135 3.527 21 34.170
8.897 10.283 37.566 39.997
22 0.686 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792 11.591 29.615
22 9.542 32.671
3.768 10.982 12.338 35.479 38.932
23 0.685 1.319 1.714 2.069 2.500 2.807 3.104 3.485 23 30.813 33.924 41.401
10.196 11.689 36.781 .
24 0.685 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745 13.091 32.007 40.289 42.796
24 10.856 35.172 38.076
12.401 13.848 41.638
3.725 25 11.524 33.196 36.415 44.181
25 0.684 1.316 1.708 2.060 2.485 2.787 3.078 3.450 13.120 14.611 39.364 42.980
3.707 34.382 37.652 45.558
26 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 26 40.646 44.314
12.198 13.844 46.928
27 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690 15.379 35.563
27 12.879 14.573 38.885 41.923
28 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674 16.151 36.741 45.642 48.290
28 13.565 40.113 43.194
0.683 3.396 3.659 15.308 16.928 46.963
29 1.311 1.699 2.045 2.462 2.756 3.038 29 37.916 41.337 49.645
14.256 16.047 44.461
17.708 39.088 48.278 50.993
30 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 30 14.953 42.557
16.791 18.493 45.722 49.588
3.307 3.551 40.256 43.773 52.336
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 46.979 50.892
3.232 3.460 53.672
60 0.679 1.296 1.671 2.000 2.390 2.660 2.915
120 0.677 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
= 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
.4Fi-"b\Di/ 653

RANDOM NUMBERS
~ VALUES FOR CORRELATION COEFFICIENTS . 65 23 68 00 77 82 58 14 10 85 11 85 57 11 73 74 45 25 50 46
JRITICAL . t. n coefficient p ts o. 09 56 76 51 04 73 94 30 16 74 69 59 04 38 83 98 30 20 87 85
twn corre1a to
f h h thesis t hat a popu1a
h db ample 55 99 98 60 01 33 06 93 85 13 23 17 25 51 92 04 52 31 38 70
These tables concern tests o ht e ypo urn values which need to be reac e y als d test 72 82 45 44 09 53 04 83 03 83 98 41 67 41 01 38
. h bl are t e mmtm l h n a one-tate . · 66 83 11 99
The values m t e ta es b . .tficant at the !eve s own, o 04 21 28 72 73 25 02 74 35 81 78 49 52 67 61 40 60 50 47 50
. . . order to e stgn
correlation coe fftctent lU ff . t
Spearman's Coe tcten 87 01 80 59 89 36 41 59 60 27 64 89 47 45 18 21 69 84 76 06
31 62 46 53 84 40 56 31 74 76 52 23 72 95 96 06 56 83 85 22
Product-moment Coefficient Level 29 81 57 94 35 91 90 70 94 24 19 35 50 22 23 72 87 34 83 15
Sample O.Dl 39 98 74 22 77 19 12 81 29 42 04 50 62 34 36 81 43 07 97 92
Level 0.05 0.025
0.005 size 56 14 80 10 76 52 38 54 84 13 99 90 22 55 41 04 72 37 89 33
0.025 O.Dl
0.10 0.05 1.0000 29 56 62 74 12 67 09 35 89 33 04 28 44 75 01 57 87 45 52 21
0.9900 4 1.0000
0.9500 0.9800 0.9000 1.0000 93 32 57 38 39 36 87 42 72 55 73 97 98 36 57 41 76 09 11 68
0.8000 0.9000 0.9587 5 95 69 51 54 43 19 20 49 57 25 90 55 26 20 70 98 43 73
0.9343 0.9429 56 45
0.8054 0.8783 0.8857 65 71 32 43 64 67
0.6870 6 0.8286 22 55 65 65 48 86 10 88 20 12 40 18 49 25
0.8822 0.9172 0.7857 0.8929 90 27 33 43 97 84 20 57 49 91 41 20 17 64 29 60 66 87 55 97
0.7293 0.8114 7 0.7143
0.6084 0.8329 0.8745 0.7381 0.8333 90 29 42 45 61 34
0.7545 0.6429 30 13 30 39 21 52 59 28 64 98 08 76 09 27
0.5509 0.6694 0.8343 8 0.7833
0.7067 0.7887 0.6000 0.7000 99 74 06 29 20 55 72 70 11 43 95 82 75 37 90 24 77 43 63 21
0.5067 0.6215 0.7977 9 0.7455 87 87 66 91 16 97 51 50 61 36 96 47 76 68 49 11 50 56 51 06
0.6664 0.7498 0.5636 0.6485
0.4716 0.5822 0.7646 10 46 24 17 74 97 37 39 03 54 83 34 00 74 61 77 51 43 63 15 67
0.6319 0.7155 0.7091
0.4428 0.5494 0.6182 66 79 81 43 40 92 84 72 88 32 83 24 67 01 41 34 70 19 26 93
11 0.5364
0.6851 0.7348 0.5874 0.6783 36 42 94 58 83 30 92 39 18 40 03 00 12 90 32 37 91 65 48 15
0.5214 0.6021 12 0.5035
0.4187 0.6581 0.7079 0.5604 0.6484 07 66 25 08 99 27 69 48 85 32 16 46 19 31 85 02 86 36 22 96
0.4973 0.5760 13 0.4835
0.3981 0.6339 0.6835 0.5385 0.6264 93 10 05 72 18 26 36 67 68 48 31 69 68 58 93 49 45 86 99 29
0.4762 0.5529 14 0.4637 49 50 63 99 26 71 47 94 32 71 72 91 34 18 74 06 32 14 40 80
0.3802 0.6120 0.6614 0.5214 0.6036
0.4575 0.5324 15 0.4464 20 75 58 89 39 04 42 73 37 93 11 07 28 77 91 36 60 47 82 62
0.3646 0.5923 0.6411
0.4409 0.5140 0.5029 0.5824 02 40 62 09 00 71 09 37 80 44 50 37 32 70 20 38 71 86 75 34
0.3507 16 0.4294
0.5742 0.6226 0.4877 0.5662 59 87 21 38 29 78 72 67 42 83 65 21 54 79 66 42 47 86 31 15
0.4259 0.4973 17 0.4142 48 08 99 66 43 38 28 13 50 25 47 93 11 15 07 84 28 30 19 07
0.3383 0.5577 0.6055 0.4716 0.5501
0.4124 0.4821 18 0.4014 54 26 86 75 44 15 20 39 20 03 58 54 80 29 62 53 06 97 71 51
0.3271 0.5425 0.5897 0.4596 0.5351
0.4000 0.4683 19 0.3912 35 35 58 45 23 58 63 66 09 62 80 92 14 55 81 41 21 48 87 34
0.3170 0.5285 0.5751 0.4466 0.5218
0.3887 0.4555 20 0.3805 73 84 90 49 01 21 90 29 57 06 68 73 51 10 51 95 63 08 57 99
0.3077 0.5155 0.5614
0.3783 0.4438 0.4364 0.5091 34 64 78 00 92 59 67 74 58 48 92 09 42 20 40 37 63 80 58 93
0.2992 21 0.3701 68 56 87 47 63 06 24 71 41 98 79 06 0718 58 29 16 49 67 37
0.5034 0.5487 0.4252 0.4975
0.3687 0.4329 22 0.3608 72 47 05 42 88 07 27 55 58 74 82 08 42 28 26 48 25 32 00 31
0.2914 0.4921 0.5368 0.4160 0.4862
0.3598 0.4227 23 0.3528 44 44 96 75 89 57 12 60 42 38 77 36 45 69 21 68 32 70 04 96
0.2841 0.4815 0.5256 0.4070 0.4757
0.3515 0.4133
0.5151 24 0.3443
0.4662
28 11 57 47 61 57 89 88 62 18 93 67 57 32 9672 21 17 13 54
0.2774 0.4716 0.3977 87 22 38 88 91 99 16 08 17 76 52 14
0.3438 0.4044 25 0.3369 27 47 98 86 35 68 23 85
0.2711 0.4622 0.5052 44 93 14 59 67 40 24 10 11 63 40 47 07 56 14 22 62 74 93 39
0.3365 0.3961 0.3901 0.4571
0.2653 26 0.3306 81 84 37 25 90 43 56 62 94 58 49 03 84 22 57 22 47 98 86 37
0.4534 0.4958 0.3828 0.4487
0.3882 27 0.3242 09 75 35 21 04 47 54 08 98 44 08 16 44 86 69 71 20 52 64 94
0.2598 0.3297 0.4869 0.4401
0.3809 0.4451 0.3180 0.3755 77 65 05 04 22 18 20 10 81 87 05 69 43 70 96 76 42 05 21 10
0.2546 0.3233 0.4785 28 0.4325
0.3739 0.4372 0.3118 0.3685 19 06 51 61 34 03 61 55 98 58 83 50 01 48 99 85 08 67 15 91
0.2497 0.3172 0.4705 29 0.4251 19 62 32 28 04 91
0.3673 0.4297 0.3624 52 91 87 07 42 48 65 24 86 09 87 68 55 51
0.3115 30 0.3063
0.2451 0.4226 0.4629 52 47 25 14 93 91 75 51 49 26 49 41 20 83 30 30 43 22 69 08
0.3061 0.3610 0.3128 0.3681
0.2407 0.2640 52 67 87 40 63 41 91 86 10 47 80 70 56 87 25 86 89 94 21 42
0.4026 40 0.3293
0.3120 0.3665 0.2353 0.2791 66 25 71 73 78 60 50 62 91 04 95 97 64 16 71 31 32 80 19 61
0.2070 0.2638 0.3610 50 0.3005
0.2787 0.3281 0.2144 0.2545 29 97 56 42 56 90 16 75 74 95 99 26 01 63 25 16 54 18 54 46
0.1843 0.2353 0.3301 60 0.2782 15 25 03 68 92 45 53 00 06 29 46 43 46 66 27 12 85 05 22 44
0.2542 0.2997 0.1982 0.2354
0.1678 0.2144 0.3060 70 0.2602 82 08 65 67 64 13 51 14 38 28 24 30 39 62 20 35 23 90 57 36
0.2352 0.2776 0.1852 0.2201
0.1550 0.1982 0.2864 80 81 35 03 25 87 24 83 59 04 67 51 52 26 21 69 75 87 28 61 50
0.2199 0.2597 0.2453
0.1448 0.1852 0.1745 0.2074
0.2702 90 0.2327
0.2072 0.2449 0.1654 0.1967 Each digit in this table is an independent sample from a population where each of the digits 0
0.1364 0.1745 0.2565 100
0.1966 0.2324 to 9 has a probability of occurrence of 0.1. It should be noted that these digits have been
0.1292 0.1654
computer generated, and are therefore 'pseudo' random numbers.
ANSWERS

Chapter 1 7. (a) Before After


8 4
Exercise la Stemplots (page 8) 73110 5
NOTE: There are alternative formats 99664 6 9
1. 1•1 50 2 953300 7 0 5 5 7 7
55 2 2 4 1 8 0 0 1 4 4 6
60 1 3 4 4 3333100 9 5 6 7
65 0 2 2 3 3 3 5 5 10 4 4 4 6 8 9
700112344 1 1 0 11 7
75 0 1 1 2 4 4 12 5
80 1 3 13 0 0 1 7 7
85 1 J Key: 85/1 means 86 ) 14 3 5
(b) 68 kg
j Key: 8[4 means 84 /
2. 3 2 2 2 2
Rate much faster after exercise
3 0 1 1 1
(b)
School A School B
2 9 9
987533 2 359
2 6 6 6 6 7 7 7
j Key: 2/7 means 27 99977743311 3 46688
2 4 5 5 J
88886655500 4 012234556779
3. 4 4 4 4 9443311 5 002244666788999
1 777777 1 6 0
1 8
/Key: 9[5 means 59) /Key: 5/9 means 59)
2 0 0 0 0 1 1 1 1
2 223333 Older teaching staff in School B
2 4 4 4 (c) Boys Girls
2 6 6 / Key: 2/1 means 0.21 seconds ) 2 4 5 5
3 3 3 2 2 2 2 2 2 2 2 2
4. 3 9
1 0 2 1 1
4
9 9 8 8 1 8 8 99 9
5 3 4 5 5
6 6 6 6 1 6 6 7 7
6 1 1 5 7 8
5 54 1
7 0013456689
8 012248 1
9 2 6 1 1
10 0 1 J Key: 5 J 3 means 5.3 em J
9 0

5. 12 5 9 /Key: 8/1 means 0.18 s-) J Key: 1f8 means 0.18 s)


11 1 3 6 Boys have faster reaction time,
10 4 Girls' reaction times more consistent.
9 7 8
8 3 4
Exercise lb Histograms and frequency
7 035568 polygons (page 21)
6 1 2 8
5 6 6 8 1. Boundary points 5, 10, 20, 25, 40,45
4 3 8 f.d. 0.4, 1.2, 1.4, 1, 0.4
3
(J !
2 4 6 2 iii I
;I
1 6 f. d.
0 0 2 6 8 /Key: 7/3 means 7.3 hours )

0-~~
6. (a) 7.4 hours, 0.5 hrs
(b) 0.074 g, 0.005 g

0 5 10 15 20 25 30 35 40 45
time (s)
)6 A. CONCISE COURS[ !I\ .A.-LE''/[L ST.A.TISTiCS
/-\NS\/ci!:.RS 657

. (a) 5.
Mass (g) Frequency f. d. f.d.
Speed 0 20- 24- 30- 32- 38- 48- 60- 0.3 J 6
ii! iii'! . !)
85-89 4 0.8 tti
90-94 6 1.2 frequency 20 24 24 16 12 10 6 0 i~. f. d.
!~L

95-99 7 1.4
0.2
p,
100-104 13 2.6 6. Boundary points 176.5, 186.5, 191.5, 196.5, 201.5, 206.5, 4
105-109 10 2 216.5
110-114 5 1 f.d. 1.2, 1.6, 1.6, 1.8, 1.4, 0.6
0.1
.i!
115-119 5 1
2
iLl ;I' .illi'
(b)
I!' d 1. f.d.
0 jill!_
2
w H 50 100 150 200 250 300
score
0
·r

+' 4.5 14.5 24.5


f.d. ~
In general Jack's scores were higher than Lucy's scores.

111111111
height (em)
0 30
186.5 196.5 206.5 216.5 f. d. (c) 12.9 m (3 d.)
176.5
height (em) 8. (a) Boundary points 9.5, 19.5, 24.5, 29.5, 30.5, 34.5,
H~­ !4. 20 39.5, 59.5

1:8111
! 7. Plot polygon at (0.75, 2), (2.25, 41), (4.5, 7!), (9, 31), f.d. 2, 4, 3, 14, 4, 2, 0.5
(13.5,2), (18, 1). (b) 28 seconds.
84.5 94.5 104.5 114.5 8.
modal class is 100-104 mass (g)
Exercise le Weighted means (page 36)
Number of occurrences of c Frequency Width f.d.
(c) 8 6 6 7 8 1. 10.4
1
9 222233 0-2 1 3 2. Class teacher, 1.65%
9 5 6 6 7 8 9 9 3-5 5 3 1i' 45 50 55 60 65
3.
4.
40.6
4
10 0 0 0 1 1 1 1 1 2 2 3 3 4 height (em)
6-8 6 3 2 5. 5, 65.8
10 5 5 5 6 6 7 7 8 8 9
9-11 3 3 1 The maize seedlings showed a tendency to grow taller with
11 013 3 4 ]Key:10I3means103] the stronger solution.
1166788 . 12-14 5 3 1' Exercise lf Mean and standard deviation
15. (a) a=20,b=26,c=12
mode= 101 g
15-17 4 3 1'
' (b) 88
(page 44)
3. Boundary points 0, 25, 60, 80, 150, 300
f.d. 2.48, 2, 4.4, 4, 0.2
' 1. (a) 5, 2 (b) 8.5, 1.80 (c) 18.8, 6.46 (d) 10l, 4.10
Plot boundaries at -0.5, 2.5, 5.5, 8.5, 11. 5 , 14 ·S, 175 Exercise lc Pie charts (page 26) (e) 3.42, 1.91 (f) 205,3.16
or at 0, 3, 6, 9, 12, 15, 18 2. (a) f.d. 0.2, 0.32, 0.62S, 1.04, 0.08
1. (a) 1540,26°,64°, 116°
9. Plot polygon at (18, 17.5), (22.5, 94), (27.5, 107), (c) 5.51 em h ' ~.
(32.5, 56), (40, 11.8). . !I' ' '
2. 208°,460,38°,36°, 32°; 5.25 em
Modal class 25-30, skewed with a tail to the nght.
(Other answers possible) S
3. 66°, 156°,24°,42°, 72°; 5.5 em, 6 em; 50° f.d. w u II'! :;
10. Boundary points -0.5, 9.5, 14.5, 19.5, 29.5, 39.5, 9 ·5
(say)
or o, 10, 15, 20, 30, 40, 60 (say)
4. (a) £120 000 (b) 68 000 (c) 90", 2JO, 9", 30"; 7.5 em
S. (a) 42 (b) 40o (c) 91; 420,30.0 em
6. (a) 860, 38", 32", 20", 168", 16" (b) 5.5 em
0
200 300 400
i' Iii
500
7. (a) £2000,£8000 (b) £400 (c) 2JO (d) 80"
2 f.d. 0.5, 1.6, 6.4, 4.1, 1.6, (0.1). 69 5 wage(£)
8. 28.8", 72", 115.2", 144"; 180
11. Boundary points 9.5, 29.5, 39.5, 49.5, 59.5, 64.5, . , (b) £338.25,£59.60
9. (a) £4500 (b) 1550, 1650 (c) 132", 24"; 8 em
1 84.5 85 3. 69.3, 1.7
or 10, 30, 40, 50, 60, 65, 70, 4. 11S.8, 7.58
or 9,29,39,49,59,64,69,84
Exercise ld The mean (page 34)
5. 16.6 seconds, 2.63 seconds
f. d. 1.1, 1.8, 2.2, 2.4, 2.8, 2.4, 1.6 1. (a) 9.7 (b) 154.8 (c) 51.375 (d) 1775j 6. 6.8, 1.11
50 100 150 200 250 300 (e) 0.908 (3d.) (f) 4 (g) 29.54 (h) 122.82 7. (a) 2 min 38 sec, 1 min 54 sec
12. 6, 8, 8, 6, 4, 10 50 300
time (mins) 13 Take boundary points 50, 100, 1SO, 200,2 •' 2. 49.3
{b) Histogram f. d. 6, 10, 15, 2.5, 0.8
. Lucy' Plot (75, 0.12), (125, 0.28), (175, 0.2), (225, 0.12), 3. 45 (2d.)
4. Boundary points 40.5, 50.5, 55.5, 60.5, 70.5, 75.5 Frequency polygon: plot (0.5, 6), (1.5, 10), (2.5, 15),
(275, 0.08) 25 0 32) 4. {a) Boundary points 0, 5, 10, 15, 20,40 (4, 2.5), (7.5, 0.8)
f.d. 2.1, 12.4, 11, 5, 2.4 Jack: Plot (75, 0.04), (125, 0.12), (175, 0.2), (2 ' . ' f.d. 2.4, 7.6, 8.4, 4, 0.4 8. 29, 5.9
(275, 0.12) (b) £11.92 9. 5.10
5. Boundary points 0, 15, 30, SO, 70, 100 10. 5
f.d. 3.6, 5.2, 6, 4.4, 2 11. (a) 10 (b) 11.7
10 43.35 years
12. (a) 121,6.19 (b) 14, 1703.8 (c) 1716,3.59
6. 21.4 em (d) 1026,58 770
f.d.
7. (a} There should not be gaps between the bars. Heights 13. {a) Frequency=5+18+22+28+22+18+5=118
5 should be adjusted so that area= frequency (Area= f. d. x width)
(b) Boundary points 4.S, 9.S, 12.5, 15.S, 18.5, 28.5 (b) Symmetric. Midpoints of intervals have been taken to
f.d. 2.8, 6, 5, 1j-, 0.8 represent the interval.
(c) 3.5 mm (2 s.f.)
0 14. 28.15, 3.84
40.5 50.5 60.5 70.5 15. 5.3
mass (kg) 16. 30.0 mph, 5.85 mph
(b) 82%
xercise lg Mean and standard deviation 6. (a) numberofgoals 0 ~1 "2 .;;:3 .;;;4 (.5 .;;:6 (c) 6.5, median
8. r;::.::::-;:(~---;-,...--c----.
trme mlns) cumulative frequency
lage 50) (d) Histogram to show pH value
cumulative 0 1 4 6 11 19 25 <39.5 0
l. 19
L 8
frequency .•
.£ 4
u
<44.5
<49.5
8
30
3. 7 (b) <54.5 64
4. 3.74 25 II'
';,'! hH'u i>
~ 3 <59.5 94
5. a=6,b=4
6. 15.6, 7.66
~

g 20
~
,, l <64.5 120

7. 12 f 15 For the curve, plot {39.5 0) (44 5 8) (49 5 30)


{54 5 64) 59 ' ,- . , ' . ' ,
8. 15,7 . · ' '{ ·5, 94), (64.5, 120) and join points with a
9. 25.9, 1.99 ~ 10 T smooth curve.

r distance (km)
(a_)~9:n:Iin~s~(=b~)~A~p~p~ro:x:·~1~1°:Yo~;~56~m~io~"~--
~
0.
1.
2.
3.
2.3, 1.41
11.7%, 2.2%
(a) 4.6, 2 (b) 4.56, 2.04
25 1 2 4 4
B 5
0
0 2
:iii.
3 4
iii
5 6
li 9.

0
cumulative frequency

0
30 0 1 1 2 2 2 3 3 3 4 number of goals <4 1
35 0 1 2 3 3 3 3 4 <10 3
(c) 5 <20
40 0 2 2 4 9
(d) 2
45 0 4 3. <35 28
7. (a) 2 (b) 3 (c) 2.47 (d) 1.94 mass (g) cumulative frequency
so 2 8. (a) 2,3 (b) 2
<60 40
55
60 1
I
Key: 45! 4 means 49[ (c) It only considers the middle 50% and does not take 3
5
< 100 50
account of large families.
Features: modal class 30-34, skewed to the right, 61
extreme value (outlier), 36.87, 35.59
l4. £195.45,£14.12
Exercise 1k Cumulative frequency, median and
10
22
Cumulative frequency curve to show distances travelled
50
lfi'" I''"'
,,
quartiles- grouped data (page 81) 32
l5. 11.87, 0.80 38 !!
Some answers are approximate and depend on the curve drawn 40
l6. 4.44 40

Exercise lh Scaling sets of data (page 55) 1· {a) r-m--as-s~(k~g~)-----c-um--u~la~ti-vo-f~r,-q-u-,n-c-y' Plot (50, 3), (54, 5), (58, 10), (62, 22) (66 32) (70 38)
(74, 40) ' ' ' , ,
1. (a) 6, 2.14 (b) 516,2.14 (c) 78,27.8 .;;: 39.5 0 Median mass= 61.3 g
2. 50, 12 ..:;;44.5 3 4 · (a} time (minutes)
3. (a) p + k, a (b) Pf.l, pa; 3,u + 5, 3o 5 cumulative frequency
<49.5
4. (a) 2 (b) 200 (c) 2.02 (d) -4, -1, 2, 5, 8, 11,14 <54.5 12 <S 2
5. (a) a~~. b ~ 22 (b) 70 (c) 76 <59.6 30 < 10 4 10
6. (a) 38, 8.99 (b) 34, 77 <64.5 48 <15
7. a=l.6,b=10 <69.5 51 <20
7
13
!n
8.
9.
a= 0.8, b = -5; 6.25
(b) Take mark intervals 0 < mark < 10, 10 < mark < 20,
<74.5 52 <25 25 0 "' !m W 1

.;;;30 41 0 20 40 60 80
Plot !00
etc. 05 47 o, M
f.d. 0.1, 0.8, 1.9, 2.8, 2.5, 1.7, 0.7, 0.3, 0.1.
(39.5, 0), (44.5, 3), (49.5, 5), (54.5, 12), (59.5, 30),
<40 so "" distance (km)

r
(a"-)c-;;32;;:=:km~·'i(~b)C..::A:"p'::pr~o::'x:_.3:'0~k"'".'n~lcC)I_A'.'pPip!"r".'ox";. 54%
(64.5, 48), {69.5, 51), (74.5, 52). Join with a smooth
boundaries 0, 10, 20, 30, 40, 50, 60, 70, 80
curve. (b) 24 (c) 26 (d) 23 {c) 25 mrns (f) 4.5 mins 10.
(c) midpoints 5, 15, 25, etc, 40.4, 15.4; price (£x} cumulative frequency
(b) 21 (c) 14 (d) 62 kg (o) 58.4 kg (f) 7.2 kg 5. (a) 687.5 hours (b) 13.2 hours
(d) a~ 24, b ~ 0.65 (2 d.)
10. (a) 12.5 (b) 20; 80, 5. 2. (a) I ii(j !!''(\ 6. (a) 80.75 g (b) 215 0
50 7. (a) Cumulative frequency cwve to show maximum temperatures 6
Exercise 1i Coding (page 58) H'ti 16
28
1. (a) 313.76,5.19 (b) 431,132 (c) 0.0171,0.00818 40 "110 41
2. 51.235,0.927 < 120 48

~~~~~1
3. 89.3275 53
4. 31.7mins.
5. 71.2, 3.82 Plot (75, 0), (95, 6), (100, 16), (105 28) (110 41)
(120, 48), {135, 53) ' ' ' '
6. 46~ sees.
(a) £104 (b) £13 (c) 47
11. x=25,y=17
Exercise lj Cumulative frequency, median and Hili 12. Plot (405, 0), (415, 4), (425, 7) (435 13) (445 23)
quartiles - ungrouped data (page 73) (455,28), (465,30}. ' , ' ' '
10 437, 412.5, 453.
1.
2.
(a)
4
9 (b) 207 (c) 1896 (d) 0.55
:::: 13. Plot (80, 0), (85, 6), (90, 18), (95, 40) (100 71) (105 86)
3. (a) 61 (b) 52 (c) 73 (d) 21 5 0 5 10 15 20 25 30 (!1 O, 93),(115, 97), (120, 99), (125, lOO) ' ' ' '
!
4. (a) 46,35 (b) 1.8, 1.2 (c) 20.5, 11.5 0 temperature "C (a) 97 mms (b) 10 mins (c) 62
5. (a) 7, 2 (b) 14, 3 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0 8.4 14. Plot (165, 0), (170, 18), (175, 55), (180 115) (185 180)
pH value (b) 12"'C (c) 80 (d) Approx. 10% (190,228), (195,250) ' ' ' ,
(a) 180.5 em (b) 175.5 em (c) 187 em (d) 189.5 em
15 · (a) 5? mins (b) 71.5 mins (c) 32%.
660 /-\ CONC!SE COUf\St: if·l A-i_t_\T_ s·:-.c\TiSTiCS

(c) u.c.b. 0, 5, 10, 15, 20, 25, 35; c.f. 0, 1, 6, 9, 11, 12, 12. (a) Stem Leaf
16. Plot (69.5, 0), (74.5, 8), (79.5, 28), (84.5, 53), (89.5, 84),
13; Q, ~ 7.25, Q, ~ 10.8, Q, ~ 16.875; 6.075, 3.55; 4 1234467788 (d) a= 0.85, b = 1 (dependent on values in (c))
(94.5, 94), (99.5, 100). (c) yes
positively skewed 5 0222346788
9.3 sees, 22,75.5 sees. (d) u.c.b. 0, 5, 10, 15, 20, 25, 30; c.f. 0, 5, 20, 45, 90, 6. (a) (i) 49.66 (ii) 433.97 (iii) 20.83
6 023366778
17. SOp, £4.96, £5.96. Large amounts affect the mean but not (b) c.f. 3, 9, 18, 28, 40, 58, 72, 83, 88
140, 160; Q, ~ 14, Q, ~ 18.9, Q, ~ 23; 4.1, 4.9; 7 00224467888
the median (c) Plot (0, 0), (10, 3), (20, 9), (30, 18), (40, 28), (50, 40)
negatively skewed 8 01255667 r~~------~
18. Histogram: frequency densities 0.2, 0.5, 0.9, 0.8, 0.1
5. X= 63.9, s = 29.5, outliers would be less than 4.9 mins, 9 3 3.4 I
Key: 412 means 42/ (60, 58), (70, 72), (80, 83), (90, 88)
(d) (i) 52 (ii) 32
,
thickness (mm) 0 <20 <30 <40 <50 <60 greater than 122.8 mins, outliers are 133, 144. (b) Q2 = 66 mtles, Q 1 =52 miles, Q 3 = 78 miles
(c) (t) 11
6. Compare median, quartiles, range, skewness
cumulative number 0 2 7 16 24 25 7. c.£. 11, 39, 77, 111, 138, 150
7. December: Q 1 = 0.3, Q 2 = 1.8, Q 3 = 2.7;
of strata July; Q, ~ 4.1, Q, ~ 6.5, Q, ~ 9.8 Take as boundaries 0.90, 1.15, 1.30 etc. or 0.91, 1.16,
40 50 60 70 80 90 1.31, etc. or 0.905, 1.155, etc. Median ""£1 30
Plot (0, 0), (20, 2), (30, 7), (40, 16), (50, 24), (60, 25) 8. (a) Stem(£) Leaf (p) · ·
36 mm, 15 mm, 0.24. Distance (miles)
(d) (i) Gives a visual impression of the data whilst 3 40 60 75 95
Exercise 11 Skewness (page 90) keeping the details. 4 20 50 75
July (ii) Gives an immediate impression of an 5 @ 75
1. (a) 0.535 (b) -0.674 a~proximately symmetrical distribution with the
0 5 10 6 45 60
2. -2.4 Hours of sunshine mtddle 50% lying between 52 and 78 miles. 7 25
3. 2 8 75
4. {a) Frequency densities: 0.8, 3, 5, 1.8, 1.2, 0.47, 0.2 8. (a) 0 1 2 2 5 9 Miscellaneous exercise ln (page 110)
1 0 0 2 3 5 7 9 9 9 60
(b) Positively skewed 10
2 25999 1. (o) X= 5.42, s = 0.33; range= 1.79, g = 5.46 ,
5. Vertical line graph, 2, 3, 3.53, 1.985, 0.801, 0.771 2
Ql = 5.295, Q 3 = 5.615, outlier= 4.07 11
6. -0.482 3 0 1
7. (a) B (b) A (c) C 4 5 7 8 IKey: 2!5 means 9.25 a.m. \ (b) (i) 5.465
(ii) 5.47 outlier
12 25
Q 2 = £5.20, range= £8.85
8. (a) (i) 0.75 (ii) 0.28 5 3
(iii) 0.22
*---~ (b) x~£6,,~£2.47
(b) Frequency densities: 0.2, 1, 1.2, 1.8, 2.8, 0.6, 1.2, 1, (b) 9.19 a.m.
(c) A: X= £6.30, s = £2.47
0.4, 0.2 (c) 9.10 a.m., 29~ minutes past 9.
4.00 5.00 6.00 B: X= £6.30, s = £2.59
9. (a) 9.6 ruins, 1 min (b) 0.33 (d)
specific gravity (d) mean.remains the same; lower paid workers do not
(c) (4.65 mins, 14.61 ruins) benefit under scheme B.
2. (a} Boundary points for histogram
(d) (4.3 ruins, 15.27 mins) 9. (a} 8, 6, 4t, 3
9.00 9.10 9.20 9.30 9.40 9.50 689.5, 709.5, 719.5, 729.5, 739.5, 744.5, 749.5
10. (a) 0.143 (Q, ~ 17, Q, ~ 26, Q, ~ 38)
(b) 0.0668(Q, ~ 11.9, Q, ~ 16.1, Q, ~20.9) Time of delivery 754.5, 759.5, 769.5, 789.5 , 10
(c) 0.333 (Q, ~ 9, Q, ~ 11, Q, d5). First interval l.c.b. 689.5, u.c.b. 709.5 f.d.
f.d. 0.15, 0.7, 1.5, 3.8, 8.2, 7, 4.2, 3.2, 1.4, 0.5
(b) Plot (689.5, 0), (709.5, 3), (719.5, 10), (729.5, 25), 8
Exercise 1m Box plots (page 99)
0 2 4 X (739.5, 63), (744.5, 104), (749.5, 139), (754.5, 160),
1. (a) Plot (0, 0), (1, 8), (2, 19), (3, 36), (5, 44), (10, 50) (759.5, 176), (769.5, 190), (789.5, 200). 6 f---
(b) 2.35 mins, 1.4 mins, 3.4 mins (c) 744.24, 14.86
(c) Positively skewed (d) 744.01, 736.08, 752.12
(t) 0.046 4
LJ_j-----------;10 (f) 0.011
2345678x
Length of call (mins) (g) In b?x plot, draw whiskers from 689.5 to 789.5, with 2
10. (a) 6, 5
medtan and quartiles as in (d).
2. Group 1: Q 1 = 0.17; Q 2 = 0.21, Q 3 = 0.23; times from (b) More than 3 standard deviations from the mean 3. 16, 6 (e) 5.86 (b) 15, 7
0.14 to 0.26 (c) (i) older brother or sister also attended 4. 35 yrs 1 month, 11 yrs 3 months. 0~~----~r--r--~
Group 2: Q 1 = 0.16, Q 2 = 0.19, Q 3 = 0.22; times from {ii) a mistake had been made 0 10 20
(a) median= 33 yrs 9 months, IQR = 17 yrs 11 months 40 60 80 100
(d) 5.5, 5 (b) 61.8% length (mm)
0.09 to 0.25
(t) decrease 5, (e) 44.5
Group 1 (c) Approx. 2.5 mm (modal class is 0.;;; x < 5)
(f) positive, less (b) 51.75 (d) (i) 39.9 mm

Group 2
11. (a) height gain (grams)

36 0 9 9
(c) >.
~ 20 0 i! !,, "i (ii) 35 nun
10. (a) 275
37 6
l" :Iii! i (c) Comparative bar chart
11. 57, (a) it becomes 39
38 !;c;
0.08 0.26 39 1 7 7 9 ~ 150 (b) x = 3x- 141 does not have an integer solution.
Reaction time (sees) 40 2 3 7 3 !j 12. 10~.7.mm, .0.4 mm; machine B nearer 100 on average, less
3. Q 1 = 22, Q 1 = 35 Q 3 =51; whiskers from 16 to 97. 41 0 0 E vanatwn Wtth machine A.
Boundary for outliers 94.5; outlier 97 42 0 5 7 G 100
43 0 4 H~
44 5
45 !
Key: 39\7 means 397] 50 w::
10 50 90 97 46 2
Length of line (mml (b) Draw plots - New corn: whiskers from 360 to 462,
Q 1 = 397, Q 2 = 450, Q 3 = 426; Standard corn:
4. (a) u.c.b. 0, 20, 30, 40, 50; c.f. 0, 20, 40, 65, 69; 0
whiskers from 321 to 423; Q 1 = 353, Q 2 = 368.5,
Q 1 = 17.5, Q 2 = 27.5, Q 3 = 35; 7.5, 10; negatively 69.5 89.5
Q, ~ 383 mark
skewed
{b) u.c.b. 0, 20, 40, 80, 100; c.f. 0, 4, 10, 34, 44; Ql =40, Q.1 =64
Q 1 = 41. 7, Q 2 = 60, Q 3 = 78.3; 18.3, 18.3; negatively
skewed, zero quartile skewness
............---------------------
(a) Histogram to show times to complete half-marathon Mixed test 18 (page 116) Data set 2
13. (a) 1, could be 1 or 2
(b) Positive skew, possible outlier 3 1. (a) 1.15 (b) 1 (c) 1.09 (a) y=90.31-1.78x
{c) 2, 1.7; more than 3 standard deviations from the 2. {a) (i) Easier to see the spread (b) X~ 37.80- 0.39y
mean (ii) 1 1 2 2 3 4 4 4
(d) (A) a mistake. 2 1 56779
(B) could be correct. 2 1 1 1 2
2 5 5 7
,(,,~)~1~·:88~,;1~.4;8~~~~~--~~~~--~~ 3 1 2 2
14. 'Cost (£1000) ~50 ~ 60 00 000 050
3 5 5 9
c.f. 540 1690 3010 3870 4320 4 1 3
Plot (20 000, 0), (50 000, 540), (60 000, 1690),
4 4 5 /Key: 1!5 is 15cm I
(b) 24.6 em
(70 000, 3010), (100 000, 3870), (150 000, 4320). 75 85 95 105 115 125 135 145 155
time (mins) (c) 21cm
Q "' £63 000, IQR: a value between £18 000 and £23 000
2 (d) Median better; distribution not symmetrical
is acceptable (b) 96.15 mins 3. (a) «% .
15. f. d. 0.93, 2.4, 1.4, 1.6, 0.9 2. {a) 7, 6, 4, 8 (b) 33' 0 -f'llJ.L..'.CfLi'.li!.J.+.ill.IJ.illjJJ-'+.
Histogram to show age distribution (b) 6.55, 5.7, 8.1 4. (a) Median same for both. 0 5 10
(c) ~has 3 outliers; ignoring these, B's average waiting Good negative correlation
trme would be lower. 2. (a)
2 B's times are less variable than A's.
5.0 6.0 7.0 8.0 9.0 10.0
.,; A's times are positively skewed, B's are negatively
Blood glucose level (mmoi/C) 6
1 skewed.
(d) Positive skew. (b) {i) If outl!ers are not the Post Office's fault, choose B
3. (a) 4.5 (b) 1.5 for qmcker service,
0 (c) No change to mean, standard deviation is increased. (ii) If outliers are the Post Office's fault then the
20 30 40 50 60
age (years) 4. (a) Pie chart, bar chart situation could happen again and there could be a
(b) Children in school, sample not representative. long wait. A avoids long waits.
(b) 35! yrs. 5. (a) 006788 2
(a) 40.15 (c) f.d. 3.6, 6.4, 4.4, 1.4, 0.7
16. 10 0 0 1 2 2 3 3 3 4 4
Time(mins) Frequency Frequency density
10 5 6 6 6 6 7 7 7 8 8 9 9
0<:;;x<1 20 20 20 0 2 3 o+V~~~~lli4~~
1<x.;;;2 47 47 20 7 0 110 120 130 140 150 160 170
2<x<l.S 51 102 (b) Q 2 = 15.5 mins, Q 1 = 12 mins, Q 3 = 18 mins. temperature
2.5<x.;;;3 59 118 (c) f--{IJ----1 (b) y~ 0.614 + 0.0207x
3<x<5 138 69 3. y = -2.59 + 0.65x; 36.5
S<x.:;;lO 85 17 4. F = -6.33 + 0.90!, F = 20.8
5 10 15 20 25 30
5. )' = 3.8 + 1.6x, x = -2.06 + 0.59y
Times (mins)
6. (a) Y ~ 15.83 + 0.72x (b) 66 (c) 59
6. (a) (i) 3 hrs 3 mins 7. (a) y
(ii) Ql = 2 hrs 42 mins, Q 3 = 3 hrs 42 mins. ,,
~~
100
(b) 40, (200), 200, 60
(c) (i} 3 hrs 20 mins (ii) 54 mins
!r
80
words per sentence
NB: boundary points could be 0.5, 5.5, 10.5, 15.5, 25.5, Chapter 2 0
45.5
0
0 60
+i
(d) 13.8, 10.2 Exercise 2a Equations of least squares :;;
(e) 9.11 regression lines (page 136) §
5. {a) Histogram 40
1. Dataset!
3} mins, divides area in half. {b) Individual values are not known and mid points have
been taken as representatives of the intervals. (a) y = 4.50 + 0.64x ~ ,,,,
17. (a) 8, 9.5 mins
(b) Boundaries 0, 5, 10, 15, 20, 25, 30; (c) 69.5, 7.6
(b) x=4.42+0.75y
20
i!
f.d. 8, 11.2, 5.6, 4, 2.4, 0.8 (d) Median- no effect, IQR- no effect,
(c) 10 mean- increased. YT:rrrr:c;n'"'"''"'"''' ,, i
(d) A False, B True, C False, D True 7, 15, 35, 20, 13, 10 ii!
9, 5.43, 14.5 0
20 0 20 40 60 80
Mixed test lA {page 114) Male employees
units of output (1000's)
1. t f f. d. r--[IJ-------1 (b) 20.7 + 0.96x
(c) 31 000-33 000 units (d) Break-even point.
65<t<85 25 1.25 Female employees 8. y=1.8+1.3x
85 <t<-95
95 <t< 105
28
20
2.8
2
~ 9. y=-8+1.2x
10. c=15,d=-5
105 <t<-115 17 1.7 10 20 30
0
115 <t" 155 10 0.25 time (years) Good positive correlation
r
i

'li_ , 'L
8. -0.036, no agreement. y
11. (a) y, 3. (a) -0.558 · h 9. 0.84, strong positive correlation between number of years
20000
{b) Low unemployment appears to be linked to htg wage
smoking and extent of lung damage. 12.5 '''iii; iii j

4. 0.79
inflation, so suggestion justified.
12.0
L\'' i!J
5. 0.73, y = -25.4 + 0.53x, x = 94.4 + 1.01y w ,•• 11.5 H:i
15000 6, 0.60, w ~ -76 + 0.89 h
§ 7. 0.77
11.0

"
ro 8. -0.415 10.5
ir
"" 10000

'L
9. (a) 0.954 (b) 2, 3 10.0
ro
0 10, (a) y X X
Iii
c
c liiliilr (c) -0,92 (d) -0.9 9.5
ro L~,&
"
5
u
y~;:
5000 170 9.0
~ 160
!! j TiD !1'
0
E
-;;; 150
11. (a)
...•. .... . .. . 8.5
0
iii: i'i i;

0 20 40 60 80 •
~
• •••• : • • • • 0
0 40 45 50 55 '
140 (b) y~23.0-0.267x
(b) y d710 + 192x
(c) Appears reasonably satisfactory apart from Band~ 130 (c) 7500, There isn't a wide degree of scatter, so estimate
X X
who have earned substantially more than the equatton could be reliable, but in general it is unwise to
120 0,60, 0.60
suggests. extrapolate outside the range of data.
12. (a) 0.7, good agreement between judges.
(d) (i) y ~ 4210 + 192x 110 No. The points do not lie in a line.
(b)
p
(ii) y=4010+207x ~ X
0
(iii) y = 4160 + 100x
(c) It would contain a term for employees who work
away from home e.g. y =a+ bx + c, where c"" £3000
for employees who work away from home and zero
25 30 35 40 45
body mass (g) 'IL_
.: ·:_: YL·
..
.
(b) y ~ 48.35 + 2.75x
otherwise. (c) 0.787
X X
12. 0.3, 0.6
13. (a) Exercise 2c Spearman's coefficient of rank 13. (a) (i) -0.976 (ii) -0.292 (m 0.292)
(b) The transport manager's order is more profitable for
correlation (page 151) the seller, saleswomen is unlikely to try to dissuade.
0 10
(b) 0.935
1. 0.26 (c) (i) No, maximum value is 1
{c) b) indicates strong positive linear correlation and
130 2. (a) 0.43 1· (ii) Yes, higher performing cars generally do less
diagram confirms this is appropriate.
(b) Some agreement betwee~ avcra~e. att:ndance ran cmg mileage to the gallon.
".,
~ (d) p ~ 2.58 + 0,88T; 15
and position in league, htgh posttJOn m league {iii) No, the higher the engine capacity, the dearer the
6. (a) Em\ page 121 diagram 3
correlating with high attendance . car.
125 (b) y = 7.77- 0.005x
4. 0.033, little or no correlation. (d) When only rankings are known; when relationship is
5. -0.62, some agreement between the scores. (c) 5.77; treat with caution as outside range of data.
0 non-linear.
4.0 6.0 X {d) The lower the percentage moisture content, the greater
0 2.0 14. 0.84; very good agreement between the ran kings indicating
"'o additive G. (a) .fi"H+'i i+CTrc:1:criTl'jlf)ij\c:f I1WI
8;P;4+ strong positive correlation between the marks in English
the heat output.
7. (a) -0.901, strong negative correlation, the greater the
(b) y= 127 + 1.17x and the marks in History; E.
nun:ber of items finished, the lower the mean quality
(c) y score.
:ill Miscellaneous exercise 2d (page 160)
135 y
1. (a) y= 3.07 + 1.17x 8.0
{b) When they variable is the controlled or independent
variable.
2. (a) ton w is required; t = 18.8- 0.853w 7.0
(b) (i) -13.6'F (ii)-28.1'F
125 ~ii (c) -0.946, points lie close to the regression line.
(d) Good estimate for w = 38, since strong correlation. 6.0
0 '>:J lli
Estimate for w =55 needs to be treated with care since !ii
70 80 90
temperature extrapolation (outside range of data) is unreliable.
5.0 i!!
3. (a) Strong negative correlation
{d) Argument invalid since relationship between yield and (b) (2.275, 38.375) . (b) y ~ 6,85- 0.0072x
additive is not linear yield declines above 4.5% (c) Ranking both p and d from lowest to highest gJves (c) pH= 6.85 at t = ooc; for an increase of 10°C, 4.0
additive; suggest additive 4.5%, temperature 90°. pH drops by 0.07
~~9 . .~
(d) In general the population density ts greater neare (d) 6.71, reliable; 6.17, unreliable, outside range of data
Exercise 2b Product-moment correlation centre of the town and less on the outskirts of the (e) 48.6'C ;
3.0
coefficient (page 145) town. f
(e) H, low population density and distance rom cen
rrc of 0
:r
1. (a) 0.930, strong positive correlation 10 20 30 y
(b) -0.828, strong negative correlation town.
(b) Amend; possibly negative trend but not strong
(c) 0.867, strong positive correlation 0.3, 0.5, 0.7 d
Mrs Brown and John; 1) Headrests 2) Heatc rear correlation, (32, 3.7) is an outlier
(d) 0.742, positive correlation. (c) Ignore outlier; weak negative correlation between
window 3) Anti-rust treatment.
2. 0.82 number of items and quality score.
l
ANSWEHS 667

16. {a) 0.98 (b) y = -7.42 + 1.11Sx, x = 6.82 + 0.862y 3. 0.714


8. y = 0.65 + 0.0157x; 8. sa) ~ (b) ~
(c) 8.20 tons per acre (d) 13.9 em 4 · (a) -0.9?5; points lie close to a straight line with .
Rate of about 1 hour per mile distance; 3 days 19 hours;
17. (a) p41.79+1.55x (b) 51 gradtent (strong negative correlation}. negative 9. IS
out of range of data, travel across water required; 0.942, 10. 0.27 (2 d.p.)
strong positive correlation, points dose to regression line. (c) 43, but treat with caution as outside range of data. (b) -1, complete disagreement in the rank· .
9. (a) y = 12.033- 0.009x 18. (a) -0.9 (b) 0 (c) 0.9 (0.6 without outlier) IcI (... )d f
111 ' ata allow a non-linear relation.
mg,, 11. (a} (i) ~~ (ii) ~ (b) t
12. 0.52
(b) 8.6 P" 1000 19. (o) y
100 Mixed test 28 (page 167) 13. (a) -fs (b) ~ (c) ~ (d) 1
(c) Decreasing number of members of population per
'7ii 90 14. (a) (i) ~/6 (ii) i~ (iii} 0 (b) t=6 or 12
doctor not effective in reducing infant mortality rate. -0.9_87, points lie close to a line with negative
10. (a) Spearman 0.613; grades given
(b) Product-moment, 0.95; numerical data given § ~~ gradtent.
(b) Yon x, )' = 7.22- 0.69x; 4.45
Exercise 3b (Probability) _combined
(page 181) events
(c) Students performed at a similar standard in the tl 60 (c) J?epreciation of £700 per year.
written and listening tests, but not in the oral test. 8
c
50 (d) (1) No, outside range of data. 1. (o) (b) (c) l
Standard in oral test related more to listening 0 40 2. ~
performance than written performance. g
~
30 (ii) Yes, since X is controlled. Usc x = y-a
b .
3. (a) ~~ (b) #r {c) -f.J (d) 1
J.,
11. (o) p e 20 2. 4. ~
~
)y 5. (a) 0.5 (b) 0.4 (c) 0.2 (d) 0.1
80 6. (a) 1il
j (ii) ~ {iii} ~ {b) 0.2
7. (a) ;1 (b) 0 (c) i
8. 0.6
60 (b) Diagram suggests a linear relationship 9. 0.7
fs
"' -6:
(c) y ~ 61.1 ~ 0.966 (x- 42.4) 10. (a) (b) ~ (c) (d)
(d) y~20.1+0.966x 11. ~
40 (c) Initial costs arc approx. £20 000, cost increases by 12. (a)
" (b) :M (c) ~
Yes "


approx. £1 per item 0 10 13.
20. (a) y 0.~15; 8.8 hours; regression line gives average value 15. At least one tail is obtained; both coins show tails

u ~Jifi4t~llw~~ul,~rrwj~~~J
20 15 pomtsnotthatdosetolineasr=051 16. (a) . ·
h . . . · -5,·L111;2 mJtumJse
• . '. d
Frmt tree Other tree Total
w ere m; IS vert1cal d1stance from point tor Birds nest
i ijJii! li 3. (a} 1ne. 2 4 6

~lOlillliil!ll!!li!il!!lt
No nest
0
0 20 40 60 80" • y 5 9 14

(b) p ~-0.54+ 1.2 n; £17


(c) 0.998; the points will be close to a line with positive
15
40 0
n, (b) 0.45
Total
(c)
7 13 20
2
gradient. 0 300
0 5 10 15 20 25 30 35 40 X
12. (a) 0.96
time (mins)
Exercise 3c Combined events (page 192)
(b) points lie close to a straight line with positive gradient
(b) p -0.142 + 0.389x 1. (a) j (b) 0
(strong positive correlation). 200
(c) 23.2°C, outside range 2. (a) 0.05 (b) 0.5
(c) Equal to 0.986 since rankings will not change.
3. (a) 0.15 (b) 0.65; no
13. (a) 0.0705 (b) y ~ 0.34 + 0.0085x
fo b
(c) 0.477
14. (a)
(d) unreliable since outside range of data Mixed test 2A (page 166)
1. (a) y ~ 3.667 + 0.038x (3 d.p.)
100
iii
'""'' 4. (a)
5. ~
6. {a} ~ (b) %
(b) 1

(b) Mathematically a= 3.667 would indicate a yield of 7. 1:


3.667 tonnes with no water at all; in practice this 0 ,! 8. (<>) 0.,5 (b) 0.35 (c) 0.375 (d) 0.4
would be nonsense, b = 0.038 indicates yield increases
Q 20 40 60 80 100 120 X 9. (a) 2704 (b) ft (c) .1. (d) 2S
10
by 0.038 tonnes for every extra centimeter of water.
{c) 4.7 tonnes (only just outside range, probably reliable),
(b) )' ~ -107 + 3.21x 10. (<>) ' B
'"
G Totals
9.3 tonnes (well outside range, unreliable). (c) 214, ove~estimatc (data fit a curve}; 375,
und~restn~ate (outside range, also non-linear Passed· 16 8 24
2. (a) y = 14.55 + 1.02 t Taken, failed
relatwnshtp), unreliable 7 6 13
(b) Initial temperature of milk. Learning
(d) no, better to use a curve. 10 8 18
Mean {4, 16.7). (c) 19.14°C, 34.95°C 4. (o) 0.714 Too young 2 3 5
(b) average decrease of 1.80°C per month (d) First; second outside range of data.
(b) Same, since there is no change in the rankings. Totals
{c) y = 23.9- 1.80x (e) y 2 35 25 60
(d) 23.9°C; regression line is valid only within range of (c) d' wou Jd decrease, t here fore 1 - _6Ld__ ld u
. n( 111 _ 1) wou {b) 60 (c) 1s (e) 1 ~.,
" I
(d) 3us
(f) !-~~
data. 11. (a) Indepen dent; obtaining a "head when
"'" a coin is tossed
15. ,, mcreasc.
{b) Mutually exclusive, 0. ·
%2.50 12. i
13. 0.5
0
2.00 Chapter 3 14. (a) j (b) {{ (c) fs
1.50 15. (a) 0.2 (b) 0.03 (c) 0.32
Exercise 3a Elementary probability (page 173 ) 16. (a) ft (b) H (c) 1
1.00
1. * {b) 1 (c) ~ 17. (a) (i) i~ (ii) _1_4 (iii) ..1.. (b) (') Jl. ,.,.
(a) 18 I I o . 169 s1 1 221 110
0.50 2. (a) 0.375 (b) 0.625 (c) 0.75 (d) 0 (e) 0.8 · a .5 (b) (1) 5p (i;) 4p (c) -'-
0 -Pii'flJ"-'f-LilljJ-'4""1~ 3. (a) 0.3 (b) 0.75 19. (o) 0.1 (b) 0.3 (c) 0.45 '"
04050607080 4. (a) 0.4625 (b) 800 20. (a) ~ (b) i
body mass (f) Temperature would stabilise at room temperature. 5. 0.73 21. (a) ·A (b) ~- (c) fi
{g) Points appear to lie on a curve, reaching a .limit at ~: ~:J (il -h i
(iiJ 6 (iiil ;~ (bJ H 22. (a) 0.15 (b) fs (c} y',
0.91, strong positive correlation.
room temperature.
ANS\IVERS 669

Exercise 3d Tree diagrams (page 200) 3. (a) 4! 9! (b)* Mixed test 3A (page 231) 2. , - - - y - - ; : - - - - - -
4. ir 1. (a) fs (b) -b x 12 13 14.
Section A 5. ,16 2. (a) P(X~x) 12k 13k 14k
'k=-l9
6. (a) 8! (b) -1:8
1. 1,1 0.0025 (b) 0.095 12! b I 3 0.1
2. (a) f.r (b)~ 7. (a) (2!)4 ( ) 66
lo !
4 (a) (b) {c) 0 (d)~
3. (a) 0.24 (b) 0.42
s. N3
IP(:~x) I ;
5
4. (a) (i) 2~ (ii) ~ (iii)
(b) (i) fr
(ii) ¥s (iii) *
:jy
9. t.f:i
10. f.&
(a)
1
2
1

6. (a)
7. t6
*
5. 0.00599,0.987
(b) ij 11. (a) 210
12. 12
(b) fs (c) -}o (b)
X 0 1
'
2

3
s. H 13. (a) 'i~ (b) ~ (c) -io P(X~x)
' l l t
9. ft
10. 0.35
14. (a) 65 268 (b) 4263
15. 510 6. "
X 0 1 2 3
11. 0.825 16. H
12. (a) 0.5 (b) 0.5 (c) 0.375 17. 4608 P(X~x)
' ~ ,• -c- /?'
First Second
"'
'
Third
13. 0.788 18. (a) 1260 (b) 2520 draw draw
19. (a) 420 (b) B 252, G 462 (c) 120 (d) ,\\
draw 7. (a) ' ' (b) PIR ~,)
14. (a) 0.02 (b) 0.64 (b) Is
(c) ~ {d) ±
15. (a) fr (b) H (c) ;(4 20. (,) 5040 (b) 1680 (c) 672
3. (a) 0.4 (b) 0.2 (c) :M I
16.
17.
(a) 0.34 (b) 0.063 (c) 0.19 (d) 0.97; 3 wh;te
0.624
21.
22.
5005,720,72
5040 (a) 144 (b) 1440 4. {aJ M tbl m tcJ 135~ tdJ 3o 2
(e) The probability that a female employee is weekly
18. (a) l (b) ! (c) -&, {d) ! (e) i 23. (a) 2.5 x 10-' (b) 3 193 344
paid. (f) 0.5
24. (a) ! (b) ! 5 · (a) ft {b) ft {c) /:i {d) ii
Section B 25. 130
26. (a) 360 (b) 6 (c) 12 (d) 1170
1. (a) -& (b) t 27. (a) 64 (b) 18 (c) H Mixed test 38 (page 232) r
2. {a) ~ (b) ji (c) ~ 28. (a) 9! (b) f. (c) 1260 (d) l 1. (a) 0.64 (b) 0.75
(c) f
3. (a) P(A occurs, given that B occurs)
(i) mutually exclusive (ii) independent
29. (a) 75 (c) m
(d) (i) 6! (ii) 72
2. (a) q-0.25 2
(c) 11
(b)---
~· f,\x ~_x_:l_~_:o.:.:.1::_,l-x_~.::'o':C:1::_,2:C'.::'":.:·•:-_9_ _ _ _ _ __
1
30. 70, (a) 55 3(4q-1) " X 0 1 2 3
(b) 0.88, 0.05 3. (a) 0.857 (b) 0.135 (c) 0.13917
(b) 30 (d) 0.973
4. {a) 0.33 (b) fr 4. (a) 0.1792 (b) 0.1686 (c) 0.203 P(X- x) 0.216 0.432 0.288 0.064
(c) 65
5. (a) ~ (b) i {c) ~ 5, (a)
(d) j (b) 0.648
6. tbl H (e) ~ ~R 10
i
(ii) -l5
R~R~S
7. (a) fg (b) (i) X -5 5 15
(f) ~
8. (a) fs (b) t·~ (c) ~ P(X~x) ;
M -k·

< ~.·.·.'s~.··.·:
9. (a) 0.096 (b) 0.156; f, Tii
10. (a) 0.7, 0.68 (b) 0.28 (c) 0.65625
Miscellaneous exercise 3g (page 228) 11
X 1 2 3 4 5 6
11. (a) 3 ~ 3 (b) H~ (c) !
{d) ! 1. (a) 0.36 (b) 0.48 (c) 0.01024 (d) 0.98976
(i) Yes, no (ii) No, yes
12. (a) 0.000877 (b) 0.421 (c) 0.65 (d) 0.642
2. (a) C, C' (b) C, D (c) C, E
~R
P(X-x) fz ii n
8
n' ~ u
n
3. (a) 0.0902 (b) unsatisfactory test
13. (a) 0.042875 (b) 0.142 (c) 0.1215
(d) 0.189 (c) 0.334125; 0.642
4. 0.32, 0.467
5. (a) 0.325 (b) i~10 (c) -&
s~R~s X 7 8 9 10 11 12
14. (a) (i) :fi (ii) l~f (iii) ft (iv) ~
~s~R ' '7~
P(X-x) .£ u
6. (a) 0.28 (b) (i) 0.157 (;;) 0,363 (iii) 0.163 II '71. n
Ti n
ibl u1 o.o3o3 u;l oA5o liiil o.o348 (c) 0.0728 (d) 0.404 u
(c) (i) 0.36 (ii) 0.848 7. 0.166, 0.580 ~s w; Equally hkely outcomes
12. ~or x = 8, draw verticalline·to 0.2; for x = 9, draw vertical
1s. H.~ 8. 5040 (a) 720 (b) 1440 1997 1998 1999 hne to 0.3; symmetrical distribution.
16. (bl i2 (cl Ns tdl # 9. (a) !
3 3 (b) ~ (c) ~.g (d) 3 ~ 3 (e) ~ (f) 6 (b) 0.372
17. (a) 0.36 (b) 0.6875 10. (a) H (b) M (d) ~(c) No (c) j!J4 Exercise 4b Expectation (page 244)
11. (a) (;) 0.005 (i;) 0.0955 (b) 0.999 (c) 0.136 (d) 8
Exercise 3e Useful methods (page 206) 12. (a) (i) j (ii) ~ (iii) i (b) fs (c) fo 1. 2.25

1. (a) 0.763 (b) 14


13. 5005, 1960, 315 (a) (b) ~ ft 2. 7

2. (,) 5 (b) 6
14. (a) 792 (b) 210 (c) i'f, (d) 120 (e) 0.1 (f) 0.1 Chapter 4 3.
4.
(a) 0.3
1
(b) 2.9
15. (a) 40 320 (b) (i) 1440 (ii) 5760
3. 0.5, 6
(c) (i) ~ (ii) ~ (d) 576 (e) is 5. H
4. 0.999 Exercise 4a Probability distributions 6. 0.75p
5. 22 16. (a) ~ (b) -fi
(c) independent (page 236) 7' ,--x--,~1~0--2-o-

1
6. fr (d) j, P(AICI • P(A) (c) {,
1 (a) 0,1 P(X = x)
7. 1; 8
P(X ~ x) 0.4 0.6
8. 0.5 (a) ~ (b) N6 (c) -#Nr;; fr o.4--/it'ii•i·~i++il.f.tci·>ilLI+i
9. (a) (i) l (ii) lz (iii) j (b) -b 8, (a) 0.3 (b) 0.2
9. (a) 0.2 (b) 2.08
Exercise 31 Arrangements, permutations, 10. 2.75
combinations (page 219) 11. ~~--.-:-----,---,--------
X 4 6 8 9 · n 14
1. 9!, --h
P[X- x) 0.16 0.32 0.16
2. (a) 6! (b) t 012345X 0.16 0.16 0.04
(b) Iii 0.85 (ii) 0.55 u;;) 0.5 (iv) 3 Loss of £1.20.
~··- ~-- ·- '''""'2.~""'"-"<"---·
----~·~

670 A CONCISE COURSE iN ,i'>,-U~Vt:-.!_ STi'...TISTICS


A.NSWt:Rs 671

2x-1 12.
12. £~(7+x) {a) 5 (b) Loss of £3.75 5. {a) {b) l {c) P(X=x)=- -,x=l, 2 , 3 {d)!!• X 2 3 4 5 6 7 8 9 10 11 12 Exercise 5b The binomial distribution
9
13. {a) 24
is (page 285)
{c)
6. {a) (b) ~ {c) P(X = x) = \, x =1, 2, 3 {d) 0.816 P{X=x) 2~ -b '
IT 21- z1- tJ IT' fi is E
l
X 0 2 3 4 7. {b) 1. {a) 0.267 {b) 0.850
1 2 3 4
P{X-x) i ' t 0 ,,' X
l 13.
7.2,£75
{a) -:k (b) :& (c) U; --b,, 7
2. {a) 0.234 (b) 0.000107 (0.0001 from tables}

{d) 1
' '' " 14. (a) i,
-f4 {b) 2.78 {c) 0.260
3.
4.
{a) 0.279 {b) 0.983 {c) 0.594
{a) 0.00549 {b) 0.157 {c) 0.503
(c) 2 1_l4 , 0.547 (d) ! 15. {a) 0.8 {b) -0.24p {c) 3.34p'
14. ¥ 8 . {a) 0.9900 {b) 0.1746 {c) 0.5886 16. {a) 1.7, 1.18 {b) 4.76
5. 0.00200
15. 2 6. {a) 0.318 {b) 0.671 {c) 0.647 {d) 0.0324
{d) 0.5565 {c) 0.9785 17. (a) 1, ~ (b) ¥, f~ {c) 11.2, 7.28 7. 0.344
Exercise 4c Expectation and variance t 0 1 2 3 8. 0.5
Exercise 4e Combinations of random variables 4
9. 0.3456
(page 251) (page 261) ,,
P{T=t) ls is l
"" is 10. {a) {i) 0.0424 (ii) 0.623 {b) 12
1. {a) 2.3 {b) 5.9 {c) 0.61 11. 0.0963, improve with practice
1. {a) 26 {b) 15 {c) 17 {d) 59 {c) 59
2. {a) 0.35 {b) 4.2 18. P(X =x)=1;, x = 1, 2, 3, 4, 5; P(X = 6) = 0; P(X =x)= __I__ 12. {a) 0.0105 {b) 0.988 {c) 0.358
2 (a) 0 or 12 or -12 (b) 294 36
3. {a) 1.45 {b) 2.45 {c) 12.15 " X= 7, 8, ... , 12; 4A-, f, '
3: {a) 1 {b) -1 {c) 34 {d) 14 {c) 14 {f) 30 13. {a) 0.329 {b) 0.461
4. {a) 3.5 {b) 15) {c) 14.5 {d) 2n 14.
4. {a) 1.3, 1 1 01 0 8 X P{X=x)
5. {a) 2.56 Mixed test 4A (page 269)
6. {a) 3.5 {b) 14 {c) 5.5 {d) 84 {c) 1.75 {b) 0 1 2
x+y 1. {a) 0.2 {b) 8 0 0.0156
{c) 11.6
7. {a) 2 {b) 3 m -3 0.32 2. {b) 1 0.0938
P(X+ Y=x+y) 0.12 0.14
8. (a) 312 , 1, 1~~ X 0 1 2 3 4 2 0.2344
9. {b) 6 8 10 12 5 . 3 0.3125
l
''
4 4 P{X=x) 1~
l
X x+y 3
' '
N 4 0.2343
P{X=x)
"' P(X+Y=x+y) 0.2 0.18 0.04 (c)
3. {a)
1t (e) 0, ~ 5
6
0.0938
0.0156
{c) 4 {c) 0
X 0 1 2 3 4 5
x-y 2 1
10. {a) 4.2 {b) 7\ {c) 3.67 P(X = x)
P{X=x) g fs ~ ! I is
11. {a) TO (b) 3i (c) 15fo (d) 2~
' (e) 47H- P{X- Y=x-y) 0.12 0.14 0.32

]li!~l :.
12. {a) 1' {b) 3j (c) ~ {b) 117 (c) ~
13.
'
1 2 x-y 1 2 3 '" 0.3
X 0
0.04
Mixed test 48 (page 269)
P{X- Y=x-y) 0.2 0.18
j
P{X=x) j '' 5. {a) 2.6, 0.24 {b) 5.2, 0.48 {c) 7.8, 0.72
1. {a) 0.4 {b) 0.8 {c) 2.6 {d) 1.44 {c) 15.6 0.2
(a) ~ (b) ¥- (c) ~ (d) ~H;l 2. {a)
X 4 5 6 8 9 12
14. {a) 5 {b) 2.5 {c) 10 {d) 10 6. 29! f 3
7. {a) 0.1 {b) 3 {c) 1 {d) 0.2 {c) 12 I)
I
'' '
15. 144 P{X=x) ;I -k
16. {a) j {b) 0.639
Miscellaneous exercise 41 (page 266) {b) 6i, 43fi, 3-H- (c) Loss of £1 (d) J\
10-x .312211· 3. {a)
17. P(X=x)=~,x=1, 2 , ... , 9, 3' · ' • 1 2 3 4
1. 0.1825,£1.75 6 12
P(X = x) = (~y-1(!), x = 1, 2, .. . 2. (a) -b (b) 2, H X
P{S=s) t l l
'
18. {a) t',
{b) 0 {c) 6 {d) 2.45 3.
4.
{a) 0.01 {b) 3.54, 0.4684
v9o> 2.57
{c) 14.7, 11.71
{b) 4.5 {c) 11
' ' ' '' 15.
symmetrical
9
19. {a) 0.04 {b) 5 {c) 4 {d) 7 {c) 16
20. {a) Lo" £3 {b) {i) p = 0.12~q = 0.08 {ii) 645, 8 5. -fs, 3 5 1 25 12, 20 16. 68i not strictly binorniaJ asp is not constant, but model
21. {a) £2 {b) {i) 4 {ii) 17 {m) 1 6. {a) 5 can be used if there are a large number of bulbs in the box.
X 1 2 4
Chapter 5 17. (a) 0.000416 (tables give 0.0004) (h) 0.0197
Exercise 4d Cumulative distribution function P{X=x) A i2 j ! 18. 5
Exercise 5a The uniform and geometric 19 · ' x_ _ _P:c{c:Xc-=-x-c)'
(page 255) {b I 2 3 4 5 6 7 8 9 10 distributions (page 276)
y
1.

I PlY< y)
y 0.1
0.05 .
. 0.2
0.3
0.3

0.6
0.4
0.75
0.5
1
P{Y=y) 1~4 n
;
.
,'' Tii' "" ~
'' '' "'
1.
2.
{a)
{a)
0.2 {b) 8 {c) 0.4
0.096 {b) 0.179 {c) 0.725 {d) 2.86
0
1
2
0.0000
0.0001
0.0011
3. {a) 0.9744 {b) 0.01024 {c) 1 {d) 1j {c) 2.5
4. {a) 1 {b) 0.7599 3 0.0109
2. {a) 0 41 {b) 0 87 {c) 0.46 {d) 0.13 {c) 2.58 4 0.0617
{c) 1.25 5. {a) 0.0226 {b) 0.00374
3. Ia) 0 1 2 5 0.2096
X 6. (a) (i} 0.6 (ii) 0.3 (iii) 4.5 {iv) 2.87
ibi iii 0.0531 Iii) 1 {iii) 10 6 0.3960
F{x) ~ "'" 1
9. {a) 0.1248 {b) 2.8352, 236
7. {a)
8. {a)
1 {b) 2 {c) 1.41
0.128 {b) X- Goo{0.2) {c) 0.512 20. 4
7 0.3206
b) X 1 2 3 4 5 6
9. {a) P{X=4)=0.7 1 x0.3=0.1029

{c)
F{x)
*
~ '
t ~
""
1
10. {b)

P{B=b)
b 0

''
1

l
2

~
3
~

"
.
4

"
(b)
(c)
The first success is at the nth attempt.
There arc at least n attempts before the first success is
obtained.
21. Experiment 1- no, 3 outcomes; Experiment 2- yes,
constant probability of obtaining black {or white),
independent trials; Experiment 3 - no, trials not
independent.
X 0 1 2 3
z (c) 1t~ (c) M 10. 0.7225
F{x) ! ' 1 11. 0.00026
Exercise 5c Expectation, variance and mode of
' ' 11.
y 0 1 2 3 12. 2
the binomial distribution (page 290)
4.
I P{X
X
x) 0.01
3
0.22
4
0.41
5
0.22
6 7
0.14
; 0 .9724 P{Y=y)
{a) 1.22
0.3
{b) 1.0916
0.34
{c) 0.36
0.2 0.16 13.
14.
15.
{a) 0.0864 {b) 2.5 {c) 1.94 {d) 1 {c) 0.028
£1.75
0.0047, December 22nd
1. 2.5, 1.5
2. {a) 1.38 {b) 4
16. tal i tbl N6 (cJ m
tdJ 1 (eJ 6 ttl 11 3. 8, 1.30
4. {a) 0.2 {b) 0.00551
672 A CONCISE COURSE IN A LEVEL STATISTICS
ANSWERS 673

8. (o) 0.223 (b) 0.116 (c) 9.28, 2.86 (d) 18.9 3. (o) 0.25
5. (o) 0.25 (b) 2.5 (c) 0.282 4
(e) Part {c) gives 223, part (d) gives 227, increase 11. (b) 2, 4 - -
6. 0.1, 0.23 9. {a) Large number of balls (b) 0.799
(b) f s
f(x)L l(x):~ 1 - 0.25x ln 3
7. (a) 10 (b) 0.000390 12. a=2,k=0.75;
10. 0.790, calls occur randomly f(x)
8. (a) 3 (b) 3 (c) 0.633
11. (a) 0.104 (b) 0.283 (c) 0.00113 (d) 9 0.5 ""
9 (a) 0.994 (b) 2 0.75 ~=0.75x(2-xl
12. 0.632, 0.069, 0.154
10. 0, 0, 3, 13, 30, 36,18 13. (a) (i) X- B (28, 0.004) (ii) 0.00545 (b) 0.785 0 3 X
11. 2500 (c) 0.66
(c) independence 0 1 2 X
12. 0.06; 293, 94, 12, 1, 0, 0
14. (o) 0.311 (b) 0.959; 3.6, 1.2 4. (a) 5k (b) ~~, ~ ; 0.2
l3. (o) 0.68 (b) 8, 1.6 5. c=1,k=4
15. (a) 0.253 (b) 3.6, 1.59 13. 0.6, 0.2
14. (o) 0.25 (b) 1.5 6. (a) 0.125 (b)
16. (a) (i) 0.201 (ii) 0.00637 (b) 2 (c) 5, 2 (d) 14
15. 1, 0.894 (a) 5 (b) 0.2 l(x)j £fix)~ 0.125x
17. (a) 0.203 (b) (ii) 0.136 (c) 0.316
(d) Assume p constant; very unlikely in First World War 0.5~
r::~~i~~~~ Cumulative distribution function
Exercise 5d The Poisson distribution l" F(x)
1
(page 297)
1. (a) 0.180 (b) 0.0527 (c) 0.195 (d) 0.670
18. P(X=x)=e- -,A,A
x!
(a) 0.082 (b) 0.242; 6.15 (c) 0.328
0 4 X
1. (a) F(x)~~~' 0<x<2 ,J r--
2.
3.
(a) 0.983 (b) 0.184 (c) 0.199
(a) 0.0821 (b) 0.560 (c) 0.0631
19. (a) 0.908 (b) 9
20. (o) 3, 7 (b) 20,20
7. (o) 0.25
(b)
1 x>2 JL._
f{x) (b) 1.59 0 2 X
4. (a) 0.603 (b) 0.616 (c) 0.00246 Reason for (a) E(Y- X)* Var(Y- X)
F(x)=~~(Sx-x2-7)
0.75
5. (a) 0.0821 (b) 0.242 (c) 0.759 Reason for (b) 2 Y + 10 could not take values less than 10. 2 {a) 1 <x.;;; 3
(d) 0.0486 (e) 0.125 21. 600 m, Po(2.5), 0.0821, 0.109, 0.779, 0.207 (b) _:1
0.5
6. (a) 0.191 (b) 0.0498 (c) 2.45 22. (a) (ii) 1.5 (b) 0.577 (c) 0.0249 1 X# 3

(b)F(x)~~~(x-1)
7. 0.371 23. 0.407, 0.366, 0.165, 0.0629, 0.816, 0.0518 0.25-P---J
8. (o) 0.0382 (b) 0.122 24. (a) 22 (b) 19; 39 3.(a) 1<x<6 (c) 2 (d) 2.5
9. 0.677 25. (a) 0.135 (b) 0.323; 0.81
10. (o) 3 (b) 0.145 26. (a) 0.387 (b) 0.929 (c) 0.893 0 3 X 1 x>6
11. (a) 90, 72, 29, 8, 1, 0 (b) 44, 44, 22, 8, 2 (d) 0.205 (e) 0.816; 0.0290 (c) 0.25 (d) 0.3125 (e) 0.3475 X O.;;;x<l

~
12. Random events; 0.5, 0.481; 31, 16, 4, 1, 0 27. (a) 0.0902 (b) 0.0613; 4
l3. (a) 0.261 (b) 6 Exercise 6b Expectation E(X) {page 323 ) 4. (a) F(x) ; 2
(b) 2
:(x -3x+4) 2 <x< 3
Exercise 5e The Poisson approximation to the
binomial (page 300)
Mixed test 5A (page 312)
1. (a) 0.159 (b) 0.766;
Query independence: friends may have joint engagements.
1. (a)
2. (a)
{6 (b) 1 (c) 2 (d) 1.6 (e) 2l4
fix)
5. (o) 0.1215
I (b) 0.841
x>3
(c) O.SSO

~ ~->
1
1. (a) (i) 0.0476 (ii) 0.0498 (b) (i) 0.225 (;i) 0.224 2. (a) 0.152 (b) 0.567 (c) 0.285
0
<x 0
kl/
.""
(c) (i) 0.171 (ii) 0.168 3. (a) X- B(150, iJ), A= 1.875, p < 0.1, n >50 6. (a) 1.5 (b) 0.75 (c) F(x)
2. (a) (i) 0.184 (ii) 0.0190 (b) 0.271 (c) 0.0498 (b) 0.559 (c) 369
x>3
3. (a) 0.287 (b) 0.191 4. (a) X- Po(0.6), X is number of boxes in a square km.
0 1 4 (d) 0.4 (e) 0.2
4. (a) -k (b) 0.713 (b) 0.549 (c) 0.0231 2 3 X

5. (a) 0.647 (b) 0.185 (d) Probably not suitable; different scatter of telephone (b) j (c) 2 F{x)
3. 3

1~
6. 0.109, 185 boxes in the city.
7. (o) 0.677 (b) 0.017; 1498 S. (o) 4.8, 0.98 (c) 0.737 (d) 0.388 4. 6m
8. (a) 0.468 (b) 0.703 5. (a) fs (b) -w (c) 0.48, money bond
9. 0.0150 Mixed test 5B (page 313) 6. 2, 0.124
10. (a) 0.47 (b) 0.041 7. 2.5, 0.803, 0.456
Poisson applies since p < 0.1 and n =50. Events may not 1. J (a) 5(1- p)p' (b) 10(1- f>)'p' 8. (a) 2.875 kg (b) £4.75, ?
3 X
6
be independent. After mis-dialling, you are likely to be 2. (a) Y - Geoi!l (b) 30 (c) 0.233 9. (a) 0.4 (b) 2.6 (c) 1.5

'*!L_
1
l'
more careful. 3. (a) Binomial (b) Poisson (c) e-
11. Random sample, 0.305 6 Exercise 6c Standard deviation and variance
Exercise 51 Sums of Poisson variables
(d) 1- e-1 ( 1 +A+~} 0.013, 0.014, 0.182 {page 333)
1 (a) 1.5 (b) 2.4 (c) 0.15 (d) 0.387
(page 303) 4. (a) 0.221 (b) 0.987 2· (a) 0.5 (b) 2i (c) 2-fl (d) 1 44 0 1 2 X
5. (a) 0.249 (b) 0.929 (c) 0.508; 0.542 (a) 1t (b) 3~ (c) j--1;- (d) 0 )53
F(x)=~~(x'-1)
1. 0.121 3.
2.
3.
(a) 0.189
(a) 0.323
(b) 0.308
(b) 0.119
(c) 0.184 4· (a) ;-134 (b) 1jf (c) (d) 0.545 m (b) 0.272 (c) 1 <x<2 (d) 1.65
5 (a) s (b) ~ {c) -f5 {d) 0.163
4. (a) 0.301 (b) 0.080 (c) 0.251 Chapter 6 6· (a) ~~ (b) 4-b (c) *~~ (d) 0.912
x>2

Miscellaneous exercise 5g (page 307) Exercise 6a Calculating probabilities 7· (a) 18 {b) ~64 (c) ?;;~10 (d) 0.672 3 19 ~~x-__!__x3 0 <x< 2
(page 319) 8 (a) ~ (b) 3, ~ {c) z_ 8. 4' 80' F(x) ~ 41 16 , 0.007
1. 0.752, 0.537 9. (a) 1 (b) 1 (c) l '(d) JJ ( ) 1 x>2
2. (a) 3 (b) 0.223 (c) 0.988 1. (a) i (b) ~ (c) H 10
.
I I
a f{x)
6 .n e
9. (a) j,!
3. (o) 0.733 (b) 0.0703 2. (a) f(x)
4. (o) (i) 0.434 (ii) 0.378 (iii) 0.148 (;v) 0.0401
(b) (i) 45 (ii) 111;N>20 k t {flxl~x 1
x' 2x 2
---+-
6 3 3
2 <x< 3

5. 0.507
6. (a) (i) 0.130 (ii) 0.271 (iii) 0.276; 65,0.0159
3./ (b) F(x) =
X

3
5
6
3 <x< 5
(b) 90,3 -2 0 3 X 0 5 X
x'
7. (a) 0.270 (b) 0.350 (c) 0.182 (b) 3fi {c) 12~-~ (d) 1.008
2x---5 5 < x<6
(b) 0.2 (c) 0.74 6
(d) 0.124 (e) £45
1 x>6
(c) j (d) -14
67 4 /\ CONCiS[ CGU f-\SE ,6,- !._E''/E l_ S,TI\TlS! iC>-; M'ISWms 675

3. (a) 0.25 (b) f(x) d - 0.5x, 0 <x G 5. (b) f~)

F(x)~l :;:+ 2x') 1


10. (a) f(x) O.;;;x.;;;1

4. :~) ~5~:) ;~ )~\!0: ::::: f~


0.5
2
3 16. (b) 1 <x < 2 (c) 1.55 (d) 0.89
1 17
3 1 x>2
17. A= t f{x)
0 1 2 3 X otherw1se 0 1 2 X -1 0 I
1 1 1
1 (x.;;;3
(c) l (d) 0.553 1 2
f(x)~ ~x+ 12 x'-4 X,;;; -1

\
(b) 2\ (c) (d) 2.16 ~(x-1) 1 <x< 3
- 12x 3
1

11. (a} 5 (b) i


1
(c) I~i; 543 tonnes
x>3
5. (a) f(x) ~ 1
7 x) 3 .;;;x.;;; 7
1
(c) F(x)= z+2x-ux
1 1 3
-1 .;;;x.;;; 1 (d) 0, H
4
12 ( -
1
2 ~ f(x)
0 2 3 4 X
0 otherwise 1-12x3 x#1
2;72> ~~
f(xl 6. (a) -0.1875 (b) 0.2375 (d) 2

l
a -1 <x<O
16~ 1
3
7. (b) 2,2 (c) 1.71 (d) 0.264 (e) 0.3645
8. (b) 0.5 (d) 0.36
18. (a) j (b) f(x) = 2a 0 .;;;x < 1

0 1 X 9. (b) 30 hrs (d) M (c) 0.0390 0 x#1,x<-1


(f) The model does not allow for lifetimes over 90 hours. (c) i (d) 0.553 (c) {~
10. (a) 3.8 hrs, 0.36 hrs 2 (b) 4 hrs (c) Approx 60% 19. (a) 2.1, 1.29 (b) 1, 0.5

\
~x 0 <x< 1 0 1 3 7 X
11. (a) 0 (b) 0.15625 (c) (i) symmetry \ii) 0.05

~+ Mixed test 6A (page 358)


(b) 3j (c) 1% (d) 3.45 (e) 0,595 (d) The player might make a similar mistake each time,
12. F(x)= x
4 1.565, 0.821 6. (a) r~sulting in more hits above the line than below, or
1 .;;;x.;;; 2 1. (a) f.d. 0.85, 0,76, 1.15, 0.8, 1, 0.9, 0.75, 0.36
5 20 VICe versa.
(e) The range would reduce. Histogram to show incomes
1 X): 2
1 1 12. (b) 75 hours (c) -Ji
13. (b) 1, 2 (c) 0 (d) ..f2,- ..f2 (d) The model does not allow for P(X > 2.5) > 0, since
P(X>2)~0 ~
14. (a) f(x) 1 0 2 3 X (c) Change to exponential
model for x > 1.8, say

1:,l_t:c) (:)2;(x) ~ (2\


8k

7. ::: xJ 0 (x.;;; 3
2 Income(£)

X
1 X ): 3 (b) ft (c) 120
8 9 (d) From original data, 106 have income in this range. In
(c) f(x}=~x 1 ,0.;;;x.;;;3
0.0125x 2 0.;;; x.;;; 8 8. (a) 2 (b) {(x)~2,0<x<0.5 (c) 0.25 (d) 0.144 the model, f(x) = 3k, 0.;;; x.;;; 4 gives too high an esti-
(c) F(x) = o.2x- 0.8 8 .;;;x.;;; 9 mate; perhaps f(x) = 2.5k, 0.;;; x < 4 would be better,

-it l~~s, o.~s;_; (25- 4w) o .; ; w.;;; 5


{
1 x#9 Exercise 6f Uniform distribution (page 349) 2. 4,
(d) 0.55
15. (a) 0.75 (b) 0.2
1. (a) \ (b) 4.5 (c) 0.75 (d) i 0 2 3 4 X 3. (a) F(w) ~ 5
2. (a) 0.5 (b) -3.5 (c) 0.866
(
0.75x 2 ~0.25x 3 0 .;;;x.;;; 2 1 w): 5
(c) F(x) ~ 1 x >2 (d) 0.288 3. (a) 5 (b) 0.325 (c) 3 (d) 1j ..!_x 1 0 .;;;x.;;; 1

~x-~x 1 -2_
4. 0.4 {b) 0.650 (c) 0.794 (d) 3.75 (f) Negatively skewed
16. (a) 0.455, 3 (b) 3.64, 4.95 5. 0.577 (c) F(x)= 1 .;;;x.;;; 4
1
-lnx
(c) F(x) ~ 1n 9
- 1.;;;x.;;;9 6.
7.
{a) 4.5 (b) 2-b
(a) ad, b ~ 11 (b) 0.125 l1
3 12 3
X ): 4
Mixed test 68 (page :359)
1, (a)
1 l

\
~(x-3)

'[L
1 x>9 3 <x< 11 (d) £283.33 (c)
(c) F(x) ~ 8 14. 8, ~, 39litres
F(x)~g-0.0 1 (x- 1 0)' O.;;;x<10

:
1 X): 11 15. (a) 2.93 (b)
8. (a) f(x) ~ 0.2, -2 <; x <3 (b) 1.44 (c) 2 ..1 (d) -1 X) 10
F(xl
0 1 9 X
Miscellaneous exercise 6g (page 355) 0 1 2 X

1. (a) 1_!_ (b) 6i (b) 1.6 (c) 0.327 (d) F(m) ~ 0.5, Fit<!< 0,5
Exercise 6e Obtaining f(x) from F(x) '
2. (a) 2.4 (b) 20_,,
.
0.178 ' 2. (b) fs (c) 0.577
m >p

f(x)~~~x1
(page 343)
1. (a} ((x)=!,2.;;;x.;;;6
0 .;;;x < 1
3. {b) 1.25(1-~)=0.5,m=11
0 10 X
3. (a) '3 (b)
(b)t~ 1 .;;;x.;;; 3 1.25

+L
1--x (c) f(x)=!-s'ox,O.;;;x.;;;10 (c) 0.495 (d) f(x) ~-, , 1 < x <5
3 X
0 otherwise f(x)

1L ~~
0 2 4 6 X (c) 1.27; 0.875
(c) 4 (d) 2
4. 0.8, 0.16,£8
0.25 0 10 X
2. (a) 0.794 (b) 0.75
676 A CONCISE COURSE IN ,4-l_EVEL ST/\TiSTiCS /\f\bWH\S 677

12. (a) Weevils are randomly scattered in the grain, the grain 5. (a) 0.5 (b) 0.8849 (c) 0.2779
9. 39.5, 5.32
is selected at random.
Chapter 7 10. 53.87, 16.48
(b) (i) 0.950 (ii) 0.105 (c) 0.158
6.
7.
(a)
(a)
0.0207 (b) (i) 0.0289 (ii) 0.0200 (iii) 0.6252
0.1247 (b) 0.6957
11. 0.203 13. (a) 0.953 (b) 0.745 (c) 0.19
Exercise 7a Finding probabilities, where 12. 92.7%, 1.32, 1.7% 8. (a)0.6298 (b) 0.1056
14. (b) 0.133 (c) 11 (d) 0.7119
z- N(O, 1) (page 367) 13. 4.299 g 9.
10.
(a)0.1728 (b) 0.6127 (c) 0.5
0.2575
14. 4.46
1. (a) 0.8089 (b) 0.8089 (c) 0.1911 (d) 0.1911
15. 2080, 236
Miscellaneous exercise 7h (page 398) 11. 0.1103,0.753
2. (a) 0.0359 (b) 0.2578 (c) 0.9931 (d) 0.9131 16. (a) 0.4875 (b) 281, 5.00 12. 9.6, 0.522; (a) 1.8% (b) 22.2%
1. (a) 46.5% (b) 0.532 m (c) 1.00 M
(c) 0.0049 (f) 0.9911 (g) 0.9686 (h) 0.2343 13. (a) (94.4, 105.6) (b) 92.55% (c) 22.14%
17. 5.2007, 0.00346; 0.0269 2. (b) 0.0693 (c) 0.0746
Iii 0.0312 iii 0.9484 lkl 0.9803 111 o.oo21 18. (a) 0.1587 (b) 128.4 (c) 1.31 3. (b) 11.5% 14. (a) 0.0787 (b) 3.02 x 1o-'
3. (a) 0.05 (b) 0.05 (c) 0.0999 (d) 0.025 (c) 0.005
19. 0.0401 (a) 0.459 (b) 0.003 4. 50.154,4
(f) 0.01 (g) 0.0025 (h) 0.075 Exercise 8b Multiples of normal variables
20. 490 g, 12.2 g 5. (a) (i) 0.0062 (ii) 0.5598 (b) 7.49 m (c) 0.27
4 (a) 0.044 (b) 0.8185 (c) 0.1336 (d) 0.3023 21. (a) 19.50 (b) not symmetrical (c) 32 (page 413)
(d) Brian, since P(X;;:;. 8) = 0.0207 whereas for Alan
5. (a) 0.1703 (b) 0.5481 (c) 0.3639 (d) 0.4582
P(X > 8) ~ 0.0062. 1. (a) 0.8962 (b) 0.9386
(c) 0.4798 (f) 0.9624 (g) 0.0337 (h) 0.9082
Exercise 7e Continuity corrections (page 386) 6. (a) 0.886 2. (a) 0.2398 (b) 0.2523
01 0.2729 Iii 0.030 lkl 0.925 111 0.4508 (b) Data not symmetric but showing a positive skew.
(m) 0.9 (n) 0.02 1. P(2.5 <X< 9.5) 3. (a) 0.244 (b) 0.659 (c) 0.409
7. (a) 1.2 (b) 53.6 (c) 54.2; 0.066
4. (a) 6, -fi (b) 0.2074 (c) 0.7601 (d) 0.5143
6. 50% 2. P(3.5 <X< 8.5)
8. (a) (i) 4.95% (ii) 0, 1, II 5. 0.2762
7. (a) 0.9 (b) 0.7 3. P(10.5 <X< 24.5)
(b) (i) 105.3 (ii) 106.45; 106.45 6. (a) 0.3446 (b) 0.6915; 0.0033,0.304
8. (a) 0.55 (b) 0.15 4. P(1.5 <X< 7.5)
(c) (i) 103.3, 3.98
9. (a) 0.9 (b) 0.1 5. P(X > 54.5)
(ii) needs overhaul, standard deviation too high.
6. P(X> 75.5)
9. (a) 14.25 p (b) 736 g (c) 462 g
Miscellaneous exercise 8c (page 417)
Exercise 7b Finding probabilities using 7. P(45.5 <X <66.5)
10. (a) (i) 0.250 (ii) 0.758 (iii) 0.00240 (b) 0.0433 1. (a) 0.60 (b) 0.20 (c) 0.95 (d) 0.5
X- N(/t, a 1 ).(page 370) 8. P(X <108.5)
11. (a) (i) 0.197 (ii) 0.820 (b) (ii) 19 (c) 0.2142 2. (a) 0.051 (b) 0.00155 (c) 0.9782
9. P(X < 45.5) 3. 1000,172,3000,298,0.16,0.02
1. (a) 0.0668 (b) 0.4013 (c) 0.1747 12. 0.360, 0. 734
10. P(55.5 <X <56.5) 4. (a) 0.0888 (b) 0.6611
2. (a) 0.7054 (b) 0.0618 (c) 0.4621 (d) 0.00456 13. (a) 0.653 (b) 0.2224
11. P(400.5 <X< 560.5) 5. 0.0625, 0.2574, 0.5, 0.7123
3. (a) 0.0548 (b) 0.1448 (c) 0.9544 14. (a) (i) 104 (ii) 33 (iii) 33 (b) 1000,200
12. P(66.5<X<67.5) 6. (a) 0.0139 (b) 0.1587 (c) 0.9332
4. (a) 0.0106 (b) 0.9857 15. (a) 0.3154 (b) 0.3068; worse, 0.5245
13. P(X > 59.5) 16. 979.27, 17.27, 133 7. (a) 0.159 (c) 0.584
5. (a) 0.3015 (b) 0.5231 (c) 0.3792 14. P(99.5 <X <-100.5)
17. (a) random events, mean"" variance 8. 12 kg, 57.0 g, 3.97%, 765 g
6. 740 15. P(33.5 <X< 42.5)
(b) 0.224 (c) 0.586 (e) 0.6201 9. (a) (i) 0.1056 (ii) 0.8882 (b) 1028 g (c) 0.0537
7. 0.00003844 16. P(6.5 <X< 7.5) 18. (a) 0.988 (b) 0.855 10. (a) (i) 0.1056 (ii) 0.144 (b) 0.0188
8. (a) 0.6554 (b) 8 17. P(X > 508.5)
9. (a) 0.0478 (b) 0.000817 (c) 0.783 (Poisson), 0.784 (binomial) 11. (a) 0.1416 (b) 0.5999 (c) 14.96 m (d) 0.3043
18. P(X < 6.5) 12. (a) 0.798 (b) 0.323 (c) 0.132 (d) 0.228
10. (a) 0.9544 (b) 0.5784 (c) 0.0435 19. (a) 0.649 (b) 0.965 (c) 0.371
19. P(26.5 <X <28.5) 13. (a) 0.252 (b) 0.0581 (c) 0.104
11. (a) 0.1056 (b) 0 7734 (c) 0.6678 20. (a) 0.988 (b) 0.624 (c) 0.828
20. P(52.5 <X <53.5)
12. 0.159, 0.775, 0.067,£37.56 21. (a) np > 5, nq > 5, X~ N(np, npq)
13. 0.785, 0.397 (b) p < 0.1, n >50, X- Po(np); 0.859 Mixed test 8A (page 419)
Exercise 7f The normal approximation to the (c) 0.204 (d) 0.034
14. 0.957 1. (a) S- N(600, 105.8), 0.0724 (b) 0.8392
binomial (page 389)
(c) 0.1606 (d) 30.54 g
Exercise 7c Using the standard normal tables 1. 0.1958 Mixed test 7A (page 401) 2. (a) 0.733 (b) 0.984
in reverse (page 376) 2. (a) np > 5, nq > 5 (b) 0.0197 (c) 0.0968 3. (b) 0.0802 (c) 0.6729
1. (a) 29% (b) 402.62 ng/m1
3. (a) 0.0154 (b) 0.8145 (c) 0.02
2. (a) 25 (b) 0.673
1. (a) O.D15 (b) 0.796 (c) -1.887 (d) -0.454 4. (a) 0.657 (b) 0.2142
(c) -0.562 (f) 1.019 (g) 0.842 3. (b) 0.0113 (c) 0.86 Mixed test 88 (page 420)
5. (a) 0.0318 (b) 0.8345 4. (a) 0.0548 (c) 0.356
2. (a) 1.94 (b) -0.695 (c) -0.915 (d) 0.722 6. (a) 0.9474 (b) 0.6325 (c) 0.5914 (d) 0.0111 1. (a) 0.127 (b) (i) 0.0016 (ii) 0 (c) 0.1003
3. (a) 0.91 (b) 1.66 (c) 0.674 (d) 2.05 7. (a) 0.4502 (b) 0.0996 (c) 0.484 2. (a) 0.8413 (b) 0.5 (c) 0.4207; 0.9938
4. 0.674, -0.674; 0.524 Mixed test 78 (page 402) 3. 0.84
8. 20, 16,0.00436
5. (a) 70 (b) 4.65 (c) 190.742 (d) 1.468 9. P(R=r)="C,(1-p)"-'p',np,np{1-p) 1. Luxibrite, 0.936
6. (458.92, 546.52) (a) 0.2304 (b) 0.9222; 0.8531 2. (a) 0.1056 (b) 0.8641 (c) 815.68
7. (a) 0.6247 (b) 629.52 g (c) 3 10. 0.1432 3. (a) (i) 0.8944 (ii) 0.4931 Chapter 9
8 8 1.158, (6.10, 9.90) (b) only able to stay for a maximum of 60 minutes
9: (~) (384.32,415.68) (b) (394.608,405.392) 11. 0.6886
12. njJ>5,nq>5 (a) 0.1853 {b) 0.1838 (c)
0 81"'10
· (c) mean+ 3a gives 6.55 pm Exercise 9a Sampling methods (page 430)
10. (a) 0.9332 (b) 0.383; 106.6,137 4. (a) 7.5 (b) randomly scattered (d) 0.901 (e) 0.2627 2. {a) 6, 6, 6, 6, 6, 5, 5
11. (a) 0.0548 (h) 26 (c) 67.4 (d) 2183 Exercise 7g The normal approximation to the 5. (a) (i) 0.0808 (ii) 0.1935 4. (b) large: medium: small= 15:25:20
12. (a) 37.8% (b) (125.5, 194.5) (c) 0.405 (b) 0.295 (c) 0.0598
Poisson distribution (page 390)
Exercise 7d Finding I' or a or both, where 1. (a) 0.6201 (b) 0.39 (c) 0.5406 Exercise 9b Simulating random samples from
2. 0.3998 (b) 0.2004 (c) 0.3661 (d) 0.0637 Chapter 8 given distributions (page 435)
X- N(p, a 2 ) (page 381) (a)
3. (a) 0.313 (b) 0.5078 (c) 0.8335 (d) 0.1101 Some answers depend on the random numbers used and on the
1. 30 4. (a) 0.2614 (b) 0.2343 (c) 0.0558 Exercise 8a Sums and differences of normal method of allocation. These are possible answers.
2. 10.7 5. 0.8901
3. 8.31, 35.9%
variables (page 409) 10. (a) 1, 1, 1, 0, 3 (b) 4
6. 0.6887,4 11. 33.134, 33.193,28.712
4. 35.5 7. (a) 0.4574 (b) 0.173 (c) 0.8312 1. (a) 210,625 (b) X- N(210, 625)
12. (a) 3, 5 (b) 1, 5 (c) 1007.2, 1016.8
5. 1.75 8. (a) 0.4594 (b) 0.5363 ... 4 (c) 0.6554 (d) 0.7698
09 13. 1.52
6. 52.73, 11.96 9. (a) (i) 0.9815 (ii) 0.3486 (m) 0.9244 (b) 0 ·0 2. (a) 0.1319 (b) 0.0127
14. means of sample means"" distribution mean; variance of
7. 2.74, 2.78 3. (b) 0.9324
10. (a) 0.199 (b) 0.185; 0.870
4. 0.0745 sample means "" f variance of distribution
8. (a) 6.99, 0.324 (b) 0.0105 J 1. (a) 0.927 (b) 0.0102; 0.297 15. (a) 4 (b) 6.1826
3. (8.07, 9.13) 2. (0.35, 0.49), 0.14 7. (a) 2, 1.18 (b) 0.302
Exercise 9c The distribution of the sample 38.64 38.64)
4. (32.08, 33.22), 380 3. (a) x---,X+-- (c) Ho: fJ = 0.2, H 1 : fJ < 0.2, not reduced
mean, X (page 443) 5. (e) 5.13 (b) 0.588 (c) (4.70, 5.56) ( .r,; ..[;;
(b) 6000
(d) reduced
1. 0.0176 6. (14.98 g, 15.78 g) 4. (a) (244.2 g, 250.2 g) (b) 6.0 g (c) ,malh 8. (a) 0.430 (b) 0.962 (c) 0.00459
2. {a) 0.6234 (b) Approx. 4 7. (9.804, 9.808) (d) Ho: P ~ 0.9, HJ: P < 0.9, looking for a decrease
3. (a) 0.1056 (b) 0.3092 (e) No ev1dence that service has deteriorated.
- Exercise 9g Confidence intervals for p Chapter 10 (f) x.;;; 12; P(X.;;; 12) < 0.05, whereas P(X.;;; 13) > 0.05
4. (a) X- N ( 4.8, So
2.88) (b) 0.7975
(page 471) 9. (a) Defects occur randomly and independently, with 110
5. 1'1 8 (b) no
Exercise lOa Testing pin a binomial two defects at the same spot.
l. (a) (0.622,0.738)
6. (a) 0.2399 (b) 0.0787 (c) 0.0127 (d) n ~ 109 (b) The normal approximation to the binomial has been
distribution (small samples) (page 494) (b) (i) 0.209 (ii) 0.221
(c) 0.140
7. 0.9212 used in the underlying distribution. 1. H 11 : P= 0.7, H 1: P> 0.7; no evidence
8. 62 2. (a) (0.293, 0.427) (b) (0.273, 0.447) 2. (a) I-I,,
p ~ 1/6, H,, p > 1/6 (d) Ho: A= 2.4, Hl: A> 2.4, evidence that number of
defects has increased.
9. (a) 42 (b) 60 3. (a) (0.238, 0.362) (b) 90 (b) There is no evidence that die is biased in favour of 4.
10. (a) (i) 0.181 (ii) 0.999 (b) O.QJ8
10. 5 4. (e) 0.28 (b) (0.176,0.384) 3. (a) Do not reject H 0 (b) Reject H 0
11. 20,3 (c) No evidence of decrease.
5. (0.156, 0.344) 4. (a) Evidence to suggest decrease.
12. (a) 12 (b) 20 11. 0.0057, 9 rnins, not significant
6. (a) (0.223, 0.352) (b) wide; (b) No evidence to suggest decrease.
13. 205od, 1768; no 7. (a) Random sample (0.244, 0.283} (ii) 90 approximately 5, (a) x;;>S
14. 0.332, 0.0587, 0.009 (b) (i) 0.26 (b) The probability that H 0 is rejected when it is in fact
Mixed test lOA (Binomial) (page 506)
15. 0.4948, 0.4944,0.1211 8, (a) (0.351, 0.369) (b) 5277 true. (c) 0.1 1.0,1,9,10
16 (a) P(X=O)=~,P(X=1)=j,P(X=2)=i 9. (0.509, 0.547) 6. (a) Accept H 0 (b) Reject H 0 (c) Reject H 0 2. (a) H 0 :p=0,15,H 1 :jJ<0.15,evidencethatnew
(b) l (c) 0.159 (d) Accept H 0 (e) Accept H 0 (f) Reject H 0 procedure has been successful.
Miscellaneous exercise 9h (page 478) (g) Reject H 0 (h) Accept H 0 (b) Staff making an effort during the first week, take
Exercise 9d Distribution of sample proportions 7. (a) Driving instructor is over-estimating pass rate. sample over a longer period of time.
1. (124.34, 125.60), 4 x> 3
(large samples) (page 447) 2. (£93.59, £101.48)
(b) 3. No evidence to support gardener's claim.
8, She could have been guessing.
l. (a) 0.0745 (b) 0.0037 3. 1.13, 0.0603, ($1.07, £1.19) 9, (a) x < 2 (b) 0.803
2. (a) 0.0057 (b) 0.527 (c) 0.1265 4. 9.71, (172.3, 173.3)
Mixed test lOB (Poisson) (page 506)
10. (a) 15% (b) 0.15(09) (c) 28% (2 d.)
3. 0.0471 5. (a) 3, (2.04, 3.96} (b) 30%, (25.2%, 34.8%) 1. (a) Poisson, 2.1
11. (a) 7.5% (2 d.)
4. (a) 0.0648 (b) 0.0970 6. 0.059, 0.61 (b) (i) 0.650 (ii) 0.222
(b) same as significance level
5. 0.7181 7. (a) Lifetime of bulb follows a normal distribution; the (c) 66% (2 d.)
(c) Evidence suggests higher rate.
6. (a) 0.0648 (b) 0.0851 (c) 0.3068 items in the box constitute a random sample. 2. (e) (i) 0.138 (ii) 0.847
7. (a) 0.22 (b) (1774 hours, 1798 hours) (b) H 0 : A= 7.5, H 1: A< 7.5, does not provide significant
8. (a) 268 (b) smaller, critical z value less
Exercise lOb Testing,< in a Poisson evidence.
Exercise 9e Point estimates and confidence 9. (0.139, 0.315); there is a 1% chance that the interval has distribution (page 500) 3. (a) Nomina!ly 5% (between 4.26% and 8.39%)
1. Increased (b) 76% (2 d.).
intervals for I' (page 460) not trapped f-l·
10. (a) 26.525,1.24 (b) (26.20, 26.85) (c) justified 2. (a) Not increased (b) Decreased
1. 236, 7 ..18 (d) n large, use Central Limit theorem 3. H 0 : A= 9, H 1: A> 9, not increased
2. -(a) 48.875,6.98 (b) 1.69, 8 x 10-'(1 d.) 11. (a) (28.98 em, 29.42 em) (b) Large sample 4. (a) 0.0424 (b) 0.849 Chapter 11
(c) 22 . 79, 1.81 (c) X normally distributed, random sample 5. (a) Accept H 0 (b) Accept H 0 (c) Accept H 0
(d) 15,43.14 (c) 10,3.11 (f) 9.71, 621..12 Exercise lla z-tests for a normal population or
(d) (26.78 em, 31.62 em) (d) Reject H 0 (e) Accept H 0 (f) Reject H 0
3. 0.5, 1.428 (e) no; 30.5 out of range of 95% confidence interval for fl 6. H 0 : A= 3.5, H 1: A> 3.5, not increased large sample size (page 522)
4 . 205.16, 9.223 12. (92.32, 99.68)
5. (a) (139.16, 140.5) (b) random sample 1. (a) z=-1.095,aceeptH 0 (b) z=l.845,rejectH0
6. (a) (10.7.1, 14.1.1) (b) 3.4
13. (a) (202.4, 207.4) (b) 0.2, (0.057, 0.343) Miscellaneous exercise lOc Binomial and (c) z = 2.5, reject H 0 (d) z"' -2.778, reject I-1 0
14. (0.123,0.392), (170.84, 178.16), (165.57, 186.83) Poisson tests (page 504) 2. z = -0.943, no
7 . (a) (448.7, 467.3) 15. 25.35, 0.13, (25.15, 25.6), vatic!
(b) The probability that this interval includes 11 is 0.99. 1. (a) 0.028 (b) 0.131 3. It could be 103.5
16. (a) (0.303, 0.357) 4. z=2.487,yes
(c) No, z value less . (c) 10% probability that interval did not trap p; people (c) 0.261; H 0 : fJ = 0.6, H 1: fJ > 0.6; teacher is not
8 . (a) (79 . 19,84 . 81) (b) (78.89,85.11) underestimating 5. z = 1.909, distribution of the sample mean is approxi-
changed their minds at the last minute mately normal.
(c) No, the central limit theorem can be used, since 11 is 2. (a) 0.552 (b) 6, 0.296
17. (£35.60,£130.80) 6. z= 1.987, no evidence
large. 18. (35.03 mg, 35.31 mg) (c) The probability he scores a penalty kick remains
9. (68.0, 70.0), random sample, central limit thco!em can be constant at 0.7. 7. (a) X< 91.5065 minutes (b) 0.0093 (c) 0.3286
19. (13.10 mm, 14.72 mm) 8. z = 0.983, accept mean is zero
applied. 20. (47.02 em, 51.38 em) (d) H 0:p=0.7,H 1 :p>0.7
10. (a) 3.612 (b) (747.3 g, 748.7 g) (e) No evidence of improvement (f) strengthened 9. 5.778 <X< 6.222
21. (0.0825 mm, 0.242 mm) 10. (a) z=1.778,acceptH0 (b) z=l.778,rejectH0
(c) random sample, central limit theorem can be applied. 3. Manufacturer's claim is not accepted; discrete distribution,
11. (a) (1011, 1114) (b) 36 P(X < 12) ~ 3.6%, P(X < 13) ~ 17.1 %. (c) Z = -1.428, reject H 0 (d) z = -2.487 accept H 0
Mixed test 9A (page 481) 11. (a) Reject H 0 and conclude mean is not 52. (b) 0.04
12. 28 4. (a) H 0 : p = 0.2, H 1 : p > 0.2
13. (a) 5.06 g (b) 89% l. (a) 0.391 (b) 93% (b) X- B(25, 0.2) (c) 9 12. (a) 0.0817 (b) 0.665
14. Histogram: frequency densities 1.2, 3.6, 6.4, 11.4, 20.4, 2. 14 5. (a) (i) 0.0278 (ii) 0.0384 (iii) 0.0768 13. z = 2.946, yes
10.2, 5, 1.8; 91.32, 7.42, 0.43, (90.5, 92.2) 3. (0.23, 0.35); the norma! approximation to the bin.omial (b) Ho: P = 0.5, H 1: fJ =!= 0.5, no indication of whether 14. (b) 0.24 (c) p>389.7 (d) 0.0494
15. 25.3, 3.6, (24.9, 25.8) has been used in the underlying theory; only cars m the car looking for evidence of more males or more females.
16. Histogram: frequency densities 0.8, 0.48, 0.3, 0.18, 0.1, park were sampled which may not constitute a random (c) Evidence of more males than females, x;;;. 13 Exercise llb t~tests for a normal population,
0.05, 0.04, 0.03, 0.02; 194,176, (173.5,214.5) sample. 6. (a) 37% (b) 42% small sample s1ze (page 527)
4. (18.51, 19.49) (c) (i) The consumer group has used a high value for the 1. (a) t=0.909,acceptH 0 (b) t=-1.89,acceptH 11
Exercise 9f Confidence intervals - small significance.
(c) t=2.15,rejectH 0 (d) t=-3.07,acceptH 0
samples (t- distribution) (page 468) Mixed test 9B (page 482) (ii) Choose 5% or 10% significance level to maintain 2. t = 2.828, evidence uf impro\·eJ times
1. (b) (92.01, 93.19) credibility.
1. (a) (177.21 em, 182.12 em) (b) 4.91 em 3. (a) t=-3.54, underweight (b) z=-3.2, underweight
2. (a) (3.59,4.68) (b) 0.146 (c) Central Limit theorem can be applied.
2. t = -1.13, no evidence that Welsh policemen are shorter
4. t= -1.1, no 17. (a) Ho:f1=1.73,H 1:p>1.73 (b) X~N(1.73,0.0008)
5. t = 2.284, mean greater than 4.3 than Scottish policemen. {c) X> 1.777 13. (a} modal class 2 to <4
6. (a) t~-3.23,no (b) (1.69,2.88) 3. 196, z = -1.714, do not differ significantly (b) 4 years 8 months, 3 years 2 months
{d) men who play basketball are not taller (e) 0.14
7. (a) z=-1.66,nochangeinmean 4. (a) 10.8125 (b) t = -1.282, accept claim 18. 9 (c) For cumulative frequency curve plot (O O) (2 42 )
(b) 0.324, t = -2.33, change in mean 5. t = 2.423, not significant difference (4, 94), (6, 122), (8, 142), (10, 160) (1l 1'76) 3 ,
8, X is normally distributed, t = 1.80, accept null hypothesis 6. Normal populations with common variance, t = 2.36, 9 months, 4 years 9 months ' ' ; years
Test llA (z-tests) (page 558) 2
*
9. H 0: f1 = 27, H 1: ft 27, t= 2.9, mean is 27 evidence that mean has increased; t = 2.041, the mean
1. (a) 21.25
(d) X =5.73,v=5,justified
10. H 0 : f1 =SO, H 1 : f1 <50, t = -0.435, not overstating could be 500 g.
7. (a) Normal populations with common variance (b) z= 0.99, no evidence to support manufacturer's Ex~rcise 12b Goodness of fit tests _ binomial
(b) H 0 :ft 1 =p 2 ,H 1 :ft 1 o~=p 2 (c) t=-0.942,same suspicion
Exercise llc Testing a binomial proportion P01sson and normal distributions (page 579)'
large n (page 532) 8. t = 1.868, evidence that new method has led to higher (c) obtaining distribution of X, distribution of X not
scores; (-2.60, 33.9) known 1. Combine last three classes X 2 = 4 09 v- 3
X~ B(S, 0.3} , . ' - 'accept
1. (a) z=1.59,acceptH0 (b) z=2.206,acceptH0 2. z. = 1.5.67, not sufficient evidence to say that the quoted
(c) z=-1.79,acceptH0 (d) z=2.118,acceptH 0 Miscellaneous exercise lle (page 554) ftgure IS an underestimate 2. X 1-_ B(S, i), E = 8~.5, 80:_5, 32, 7 (last 3 classes combined),
{e) z = -2.937, reject H 0 3. (a) (i) P(reject H 0 when H 0 is true) X -8.21,v=3, brased;x=1,p=0.2,X~B(5,0.2},
2. z = -2.40, do not accept claim as there is evidence that 1 (17.1, 19.7), there is a 10% chance that it hasn't trapped (!il P(accept H 0 when H 1 is true) E = 66, 82, 41, 11 (last three classes combined) X 2 is very
proportion is less. ft: z = 0.759,11 could be 17.8 (b) (1} z=2.372,meanisgrcaterthan 17.5 small, v = 2, too good a fit, query data.
3. z = 1.637, yes 2. (a) Children within families selected are representative of (ii) X> 18.09 3. np, 1.6, 0.32, E· 7.3'171 ''
16.1, 7.5, 1.8, o•21COmine b.
4. z= 1.476, no all children. {iii) 0.639 last 3 classes) X 1 = 1. 79, v = 2, good fit
5. z=-1.990,no (b) z = 0.939, data. do not indicate that boys and girls are 4. 0.0606 (•6%), 0.1118 4. ~ - B(2, i), E = 150, 60, 6, X 2 = 9.6, v = 2, reject; use
6. z = 1.5, no not equally likely x = 0.444, p = 0.222, find E, v = 1
7. z = 1. 705, evidence that more than 65% own a mobile 3. (a) H 0 : fl = 30, H 1: ft > 30 (b) X> 33.95 Test llB (z-tests) (page 558) 5. (a) X= 2, P = 0.4, E = 6, 21, 28, 18, 7 (combine last
phone. (c) Evidence that mean speed is greater than 39 mph (X is 2 classes)
1. Ho: II= 125, HI: fl < 125, Z = -1.549, no evidence that{tis 2
8, (a) (i) 0.0297 (ii) 0.0934 in critical region). (b) X =2.21, v=3 (c) yes, binomial adequate
(d) 0.9941 lower
(b) z= -1.792, germination rate less than 75% (only just 6. (a) X- 8(5, 0.2088), E • 155, 205, 108, 28, 4, 0
-do further tests) 4. (a) H 0 :tt=43, H 1:p>43 2. z = 2.318, government spokesman {combme last 3 classes}
(b) Since n is large, the distribution of the sample means is 3. (a) X< 59.82 2
9. z=2.43, yes (b) X = 5.959, v = 2, binomial (but only just)
10. Replies were representative of the population. approximately normal (b) It is acc~pted that the mean is 60 when in fact it is an 7. (o) 7
(a) z = 1.220, no evidence to suggest proportion in favour (c) z = 1. 768, mean amount has increased alternative value (less than 60). (b) n ~ 20, p ~ 0.35; 0.16135, 8.1
(d) (43.35, 52.65), consistent, 43 out of range of (c) 0.057 (c) 12.3
is more than 0.7.
4 · (a) Ho:pl-f12=0,H :p -p o~=O
(b) (0.681, 0.808) confidence interval 1 1 2 (d) E~12.3,8.6,9.2,8.1,11.8 0~9 717 8 9
11. (a) Evidence that proportion is lower 5. H 0:p=0.13,H 1 :p<0.13,2%,0.161 (b) z = 1.6, no difference X 2 = 8.46 (e) 3, not good ,fit at 5% ' ' ' , '
level
(b) No different 6. {a) (i) P(X > critical value If1 = 65) (c) Distribution likely to be skewed rather than symmetric 8. (a) E~246.6, 345.2, 241.7, 112.8, 39.5, 14.2
12. (a) z = -3.03, evidence that p < 0.4 (ii) P(X <critical value 1ft is value specified by the (b) X =32.2, v=S, not accepted
(b) (0.379, 0.458); 75 alternative hypothesis) Test llC (t-tests) (page 559) 9. X=2.5,E=8,21,26,21, 13,11 (combine end classes)
13. z= -1.267, no (c) Accept H 0 , Type II X 2 = 2.59, v = 4, good '
1. t ~ -.3.5~0, evidence that mean falls below $7.40; norma[
14. z = -2.44, evidence that proportion has fallen (d) 0.0059, Type II error would be less and tends to zero d1stnbutwn 10. X~ 1.28, E = 41, 52, 3~, ~~· 6 (combine end classes),
as fl increases 2. t = -2.915, San Marco cooler X = 6.81, v = 3, not s1gmftcant
Exercise lld testing the difference between 7. Not representative as it excludes people at work, school, 11. X= 0.65, E = 20.88, 13.57, 5.55 {combine end classes)
3. (a) 4.238 (b) Norma[ distribution, t = 2.857, yes 2
etc; better to take random samples at random times during X =1.85,v=1,accept '
means of two normal populations {c) Perform z-test not t-test
the day for a spread of days, (68%, 80%). z = -2.03, data 12. (a) X~ 1.2, E = 99, 119, 72, 29, 9, 2 (combine end classes)
4. t = -2.046, new score higher; (-6.948, 32.282) or
Section A: z-tests (page 543) provide significant evidence (-32.282, 6.948) (b) X = 0.48, v = 3, very good fit
1. (a) (i) z = -2.096, reject H 0 (ii) z = -1.402, accept H 0 8. (a) X>p 0 +1.96_!!__orX<!t0 -1.96_!!__
13. X= 0.9, E= 21,,18, 11 (combine last 3 classes), x2 = 1.80,
(iii) z = 2.493, reject H 0 -r,; -r,; v = 1, yes, consistent
(b) (i) z=1.99,acccptH0 (ii) z=2.076,rejectH0 a Chapter 12 14. (b) E·7.3, 12.4, 10.6,9.7
(iii) z = -2.036, accept H 0 (iv) Z = 1. 783, reject H 0
(b) X >!t 0 -2.326 ..r,; 2
(c) X = 177,.v = 2, reasonable
There will be variation in answers, depending on the degree of (d) very low, suspicious
(v) z=1.779,rejectH0 (vi) z=-2.321,acccptH 0 9. (a) 0.422 accuracy used in various stages of the working.
15. E= 6.68, 9.19, 14.98, 19.15, 19.15 14.98 9 19 6 68
(only jqst) (vii) z = 2.55, reject H 0 (b) E(unbiased estimate)= true value; batch not rejected,
X 2 = 3.197, v= 7, accept. ' ' · ' · '
2. 0.567, z = -2.219, flowers on sunny side grow taller 9.6% Exercise 12a Goodness of fit test - uniform If 11, a 2 unknown, v = 5
3. z = 3.52, second population has smaller mean than first 10. (a) H 1:p>-!- (b) N>SO (c) 0.059,.,6% and given ratio (page 569)
4. z = 2.036, significant at 5% level, not significant at 4% 11. Accept as slow if mean bounce <11.645, 0.0004 1
16. (a) E; 3~ 13, 28, 32, 1.8, 6 (combine first 2 classes),
1. X = 1.93, v = 3, die is fair X =11.9,v=4,reJect
level 12. (a) (i) 10.46 (ii) 15.64, E(unbiased estimate)= true value
2. x = 18.16, v = 9, uniform distribution
2 (b) x~171.54,,~7.11 ' E~6 ,18 , . ,32, ,28co 13 m 3( m b'e
5. 4.41, (9.87, 10.73), 3.61, z = 1.49, not significant evidence (b) 1
6. z = -1.646, reject Mr Brown's claim (only just) (c) Central Limit theorem holds when n is large 3. X 2 =6.19,v=2,yes Iast 2 classes), X 1 = 1. 73, v = 2, accept normal
z = 3.367, accept claim that mean duration is more than 4. X~=4.95,v=3,no;X 1 =9.90,v=3,yes 17. (a) x ~ 1.732, & ~ 0.216 (3 d.p.) E" 7. 78 26 05 44 12
7. z= -2.04, evidence of difference 13.
33.64, 13.41,X 1 =8.96,v='2 ' . ' · '
8. 1.15, z = -2.913, significant evidence 12 months; n large, usc Central Limit theorem 5. X = 8.24, V= 7, accept theory
(b) X'·2.42,v~1
9.
10.
z = 1.627, accept; 124
27.33 (26.77, 27.89), 2.4, z= 1.97, those of higher
14. (o) 75
(b) z = 2.19, machine is not correctly calibrated ~: ~~: ~o:~s~;=\~~~
intelligence do not have greater foot length. (c) Unbiased estimate of standard deviation used, 8. 15.5 Exercise 12c Contingency tables (page 588)
distribution of sample mean appmximately normal; 9, 7~.81, 17.8, 7.8, 6-? X 1 =5.92, v=3, no difference 1. E = :8, 10.67, 21.33, 42, 9.33, 18.67, x 2 = 1.037, v= 2
(0.316, 5.684) 10. X = 38.2, v = 9, ev1dence of bias no d1fference '
Section B: t-tests (page 545) 11.
2
X =10,v=4,notuniform
(d) Smaller, might lead to result that machine is correctly 2. E = zs. s, 2s.s, 6o.s, 6o.s, 26.5, 26.5, 7.5, 7.5, x 1 = 2 . 03 ,
1. (i) (a) 17.73 (b) t=2.135,rejectH 0 calibrated. 12. x = 4.4, v = 5, die is fair
2
v = 3, mdependent
(ii) (a) 87.09 (b) t=-0,567,acceptH 0 15. (a) 66.25, 133.40 (b) H 0 : f1 = 62.5, H 1 > 62.5, 3. E= 50.1, 29.5, 23.4, 22.9, 13.5, 10.6, X 2 =4.00, v =2 yes
(iii) (a) 27.5625 (b) t=2.088,acceptH 0 z = 1.465, no evidence of increase 4. E = 65.1, 28.9, 58.9, 26.1, X 2 = 7.43, V= 1, yes '
(iv) (a) 4.182 (b) t=1.260,acceptH 0 16. (a) z=2.475, mean has increased 5. E =27.5, 972.5, 27.5, 972.5, X 1 = 4.79, v = 1, yes
!INS\fv'U\S 683

11. (a) H 0 : Grades are in the ratio 15:20:35:25: 5, H 1:


6. E = 11.4, 14.3, 8.6, 15.7, 18.3, 22.9, 13.7, 25.1, 20.6,
Grades are not in this ratio E = 30, 40, 70, 50, 10,
Chapter 13 3. (b) - 0.97:5 (_c) strong negative correlation
25.7, 15.4, 28.3, 29.7, 37.1, 22.3, 40.9, X 2 = 12.0, v = 9, (d) There IS evidence of correlation bet
X 2 = 7,074, v = 4, same proportion h
accept sunshine and temperature. Ween ours of
7. E = 66.7, 33.3, 53.3, 26.7, X 2 = 6.81, v = 1, no
{b) H 0 : Sex and grade are not associated, H 1: There is an Exercise 13a Significance test for
association between them, v = 4, there is an 4. (a) 0.4286
2
8. E = 21.0, 10.0, 7.0, 15.5, 7.5, 5, 41.5, 19.5, 14, X = 7.86, product - moment correlation coefficient
v = 4, accept
association (page 604) *
(b) Hb o= p,= 0, HI; P, 0, no evidence of correlation
12, (a) X= 0.74, combine 4 and over, E= 667.96, 494.29, etween attendance ·md ·· .
9. E = 34.2, 29.8, 12.8, 11.2, X 2 = 1.22, v= 1, no 1. Reject a, c, f, g, h: do not reject b, d, e 5 0 527 ·d < posltwn Ill the league
182,89, 45. 11, 9.75, X 2 = 108,87, v = 3, not adequate · · , no ev1 ence of agreement
10. E = 17.5, 82.5, 17.5, 82.5, X 2 = 0.58, v = 1, no 2. (a) 0.3755 6. (b) 0.S35
2
11. E = 202.2, 260. 7, 318.1, 1 84.8, 238.3, 290.9, X = 2.02, {b) Not consistent, Poisson model was not adequate
(c) E = 259 {all classes), X 2 = 13.8, v = 3, not consistent (b) Ho: P = 0, H 1: p > 0, reject H 0 , no evidence (c) some positive correlation
v = 2, independent (c) X and T arc jointly normally distributed with
13. (a) E ~ 19.33, 15.33, 10, 15.33, 9.67, 7.67, 5, 7.67, {d) Low mark in x, high mark in y
12. v = 2, X 2 = 5.99 correlation coefficient p and that the data constitute (e) 0.794
X 2 = 12.08, v = 3, mark is associated with type of
13. (a) v = (3- 1 )(3 -1) = 4 (b) difference a random sample from all values of x and t (£) Ho: Ps = 0, HI: P, > 0, evidence of positive correlation
14. E= 13.5, 15.5, 21, 8.64, 9.92, 13.44, 31.86, 36.58, 49.56, question
(b) Poisson, this is most similar question 3. (a) Scatter diagram · 7. (a) c = 2.48 + 0.607m
X 2 =11.35,v=4,yes (b) 0.834 (b) 0.S93
15. E = 33.6, 22.4, 63.6, 42.4, 22.8, 15.2, X 2 =4.775, v = 2, {c) E = 22.5, 22.5, 22.5, 22.5, 7.5, 7.5, 7.5, 7.5,
X 2 = 17.6, v=3, yes it is (c) reasonable (c) 11.4 {d) 0.516, no (c) r
no difference
16. E = 90.405, 56.595, 35.595, 20.405, X 2 = 13.3, v = 1, (d) Contingency table- popular and well answered; (d) Stude~t's v~ew i.s wrong; correlation does not imply 8. 0.690, H : p = 0 H . p
9, (b) 0.7830 s , 1· s
*0 'd f
, no evt ence o correlation
Binomial and Poisson fits- average popularity, causatton; Jn th1s case there may be a common
related underlying cause such as wealth. (c) Ho: P = 0, J:"f1: P > 0, evidence of positive correlation
relatively badly answered, normal fit- unpopular but
well answered by those who attempted it. 4. (a) -0.690, reject H 0 in favour of H (c) Data constitute a random sample of all values uf x
Miscellaneous exercise 12d (page 594) 1
14. (a) E= 480 (all classes), X 2 = 14.8, V= 4, there is evidence {b) 0.686,rejectH 0 in favourofH 1 andy, years selected may no be representative.
1. H 0 : Preference for proposed route is independent of where (b) E~ 6.405, 6.51, 8.08.1, 24.705, 25.11, 31.185, 29.89, (d) lower
people live (no association between them) H 1: There is an 30.38, 37. 73, X 2 = 16.9, v = 4, length of employment Exercise 13b_ Significance test for Spearman's 10. 0.825, 0.929, ~vi~e~1ce of positive correlation (l% level)
association between themE= 47, 28, 31.33, 18.67, 15.67, is associated with grade rank correlation coefficient (page 607) 11. -0.3341, not stgm6cant (5%), -0.6939, significant (2.5%)
9 .33, X 2 = 1.4 79, v = 2, no association 15. E ~ 38.78, 34.02, 18.2, 39.21, 34.39, 18.4, 27.28, 23.92, 1. Reject b, f, h; do not reject a, c, d, g, i, e
2. (a) H 0 : Occurrence of shoplifting is uniformly distributed 12.8, 22.59, 19.81, 10.6, 22.59, 19.81, 10.6, 28.55, 2. 0.714, no evidence of agreement (only just)
Mixed test 13A Correlation coefficients
between the months, H 1: Shoplifting is more likely to 25.05, 13.4, X 2 = 16.0, v = 10, no association, expected 3. (a) 0.52
(page 615)
occur in some months than others. frequency must be greater than 5, would not make sense;
(b) Ho: Ps = 0, HI: Ps > 0, do not reject H 0 , no evidence of (a) 0.473 (b) evidence
E = 14.5 (all classes), X 2 = 14.268, v = 11, no associa- E = 60{all classes), X 2 = 13.2, v = 6, reject hypothesis
agreement between the interviewers 2. (o) 0.667
tion 16 (a) X= 1, E= 36,79, 36.79, 18.39, 8.026, {3 or more), 4. 0.745, evidence of correlation
3. H 0 : No association between reaction and eye colour, (b) Ho: P., = 0, HI: Ps > 0, judges in broad overall
X 2 = 12.9, v = 2, not suitable 5. (a) 0.66 agreement
H 1: There is an association between them, {b) E = 26.25, 6. 75, 8. 75, 2.25, X 2 = 3. 77, v = 1, yes (b) evidence of positive correlation {c) evidence of correlation
E~ 15.675, 7.425, 9.9, 26.12.1, 12.375, 16.5, 1.1.2, 7.2,
6. 0.4286, Ho: Ps = 0; no evidence of positive correlation (d) Spearman's rank
9.6, X 2 = 20,9, v = 4, association between reaction and eye Mixed test 12A (page 598) 3. r= 0.310, not significant, no evidence of possible
colour correlation
4. E = 6.3 (combines first 4 classes), 8.85, 14.66, 18.99, 1. {a) 1.04 {b) Calls occur at random Miscellaneous exercise 13c (page 612)
(c) E ~ 58.01, 55.11, 26.18, 10.70, X'~ 4.86, v ~ 3, 4. (a) 0.619
19.28, 15.32, 9.52, 7.05 (combine last 3 classes), I. (a) 0.636
Po(0.95) is suitable (b) Ho: Ps = 0, HI: P, > 0, not evidence of positive
X 2 = 4.908, v = 6, good fit {b) Ho;_r,s = 0, H 1: P, > 0. Accept H 0 , no evidence of correlation (just)
5. (a) H 0 : No association between brand of fertiliser and 2. E=6,20, 12,2, 9,30, 18, 3, X 2 = 13.11,v=3, there is a postt1ve correlation.
link between General Studies performance and degree class (c) Two very different sets of data being compared
yield, H 1: There is an association between them, 2. (b) 0.916
E = 10, 12, 8, 8, 9.6, 6.4, 7, 8.4, s.6, X 2 = 7.811, 3. E = 15.9, 21.1, 26.1, 21.1, 15.9 (combine first 2 and last 2
(c) Evidence of positive correlation between the number
v = 4, no association classes), X 2 = 7.08, u= 4, N(180, 9) is suitable
of wren territories recorded and the number of adult
{b) v = 2, there is an ·association between choice of 4, H 0 : There is no association between gender and passing a wrens trapped.
company and yield driving test.
(c) Quickgrow H 1: There is an association.
6. {a) H 0 : Peak flow measurements are normally distributed E = 27.5, 22.5, 27.5, 22.5, X 2 = 2.585, v = 1, results do
(with mean and variance as estimated from the data), not indicate link
accept
{b) Expected frequency must be greater than 5, combine Mixed test 128 (page 599)
classes 1. E = 32 {all classes), X 2 = 1.6875, u = 3, no particular
7. (a) E~ 18.33, 20.67, 20.68, 23.32, 7.99, 9.01 preference, data cannot be used to discredit claim
{b) X 2 = 6.88, v = 2, there is an association 2. E = 21.34, 43.66, 21.66, 44.34, X 2 = 4.72, v = 1, there is
8, (a) H 0 : No association between candidates' grades in an association between the two factors
E = 16.67, 16.67, 16.67, 16.67, 33.33, so, X = 4.65,
mathematics and physics, H 1 : There is an association 2
3.
between them, E= 17.2, 13.8, 12.8, 10.2, X 2 = 5.672, v = 5, die is biased in the way described
v = 1, there is an association 4. (b) Random positions {c) 37.24, 2.50
(b) Expected frequency might drop below 5 (d) H 0 : The distribution can be modelled by Po{2.59), H1:
9. {a) E=44.62, 66.94, 50.2, 25,12, 13.12 {combine last The distribution cannot be modelled by Po{2.59),
two), X 2 = 10.6, v = 4, at 5% level, no X 2 = 7,55, v = 5, Poisson model is supported by data
(b) Use mean from data for A, v = n- 2 = 3
(c) Do not have independent events with a constant
probability of success.
10. E = 644 (all classes), X 2 = 10.95, v = 4, data do not
support claim, random events, X= 1.1, E = 33.3, 36.6,
20.1, 10 {combining last 2 classes), X 2 = 0.48, v = 2,
accept
r ,__ -- -- -----

INDEX

acceptance region 485,509 critical region


addition law {or rule) 183 critical value 485,509
alternative hypothesis (HtJ 485,511,566,583,601 cumulative distribution function continuous
485,509
and rule, probability 186 discrete ' 334
approximations, normal to binomial 382 uniform distribution 253
normal to Poisson 390 348
cumulative frequency curve, polygon 61
Poisson to binomial 299 percentage diagrams
arrangements 206 step diagram 65
59
back-to-hack stem plots 7 cumulative probability tables, binomial 645
Bayes' theorem 197 Poisson
647
best estimate 447 data, continuous
bias 423 discrete
2
binomial distribution 279 grouping
1
cumulative probability tables 9
use of
645 x
degrees of freedom, 2 distribution 561,579,585
283 t-distribution 462
diagrammatic representation 289 dependent variable
expectation and variance 121
286 difference between means hypothesis test 534
fitting a theoretical distribution 288 difference between random variables
goodness of fit test {x2 ) 257
571 normal variables 407
mode 289 distribution, frequency
normal approximation to 2,3,9
382 probability 233
Poisson approximation to 299 function (cumulative)
significance (hypothesis) test, n large 334
528 of sample mean 436
n small 483-492 of sample proportion
box and whisker diagram (box plot} 445
92 shape 20
usc to identify outliers 98
equally likely events 171
calcu.lator, usc of for mean 31 errors, type I and type II
product-moment correlation coefficient 493,520
141 estimation of population parameters 447
regression lines 133 exhaustive events
standard deviation 180
40 expectation, continuous random variable 320
census 422 discrete random variable
central limit theorem 237
442 experimental probability 169
chi-squared distribution {z2 ) 561
degrees of freedom 561,579,585 frequency density 12
procedure for goodness of fit test 566 frequency distribution 9
test for independence (association) 566 curve, polygon 17, 19
tables 651 geometric distribution
use of 271
562 diagrammatic repn~sentation 272
test, binomial 571 expectation and variance
contingency tables 275
582 mode 273
normal 576 progression {use in probability)
Poisson 205
573 histogram
ratio {in a given) 569 11
uniform 567 hypotheses, alternative, null 485,511,566, 583, 601
circular diagrams 24 hypothesis tests 483,507,560,600
class boundaries 3 independence {z2 test for)
cluster sampling 582
429 independent events 185
coding to find mean and standard deviation 56 variable
coefficient of rank correlation, Spearman's 121
146 interpercentilc range 68
product-moment correlation coefficient 139 interquartile range
skewness, Pearson's 68
85 interval estimation {confidence intervals) 449
quartile 88 width
combinations 457
214
combining sets of data 47 least squares regression lines 121
comparative frequency polygons 18 level of significance 485,509
complementary event (probability) 172 linear combination of normal variables 403
conditional probability 182 of random variables 336
confidence interval 449 linear correlation 119
difference between means 535 linear interpolation for median, quartiles 78
mean 450-457, 462 lower quartile 69, 71, 75
proportion 469 continuous random variable 336
contingency tables S82 mean, use of calculator
continuity correction, normal to binomial 31
383 confidence interval of 450
proportion test (large sample) 528 discrete data
Yates' (x 2 test) 28
586 distribution of sample 436
contirmous data 2 frequency distribution 30
random variable 314
correlation hypothesis test, difference between means 534
119 mean 514-520,524
cocfficieut, product-moment 144 Poisson mean
Spearman's rank 496
146 unbiased estimate 447
hypothesis test, product-moment 600 weighted
Spearman's rank 36
tables 605 median, data 69
use of 652 continuous random variable 335
602, 605 linear interpolation 78
686 !I-IDE/

JS 1 1(3bs
2 2 NOV 2004
mid-interval value
modal class 30 multiples of
mode, raw data 12 sum of 246,259
continuous random variable 2 range 256
multiple of random variables 329 interpercentilc 37
normal variables 246,250 interquartile 68
multiplication law (probability) 409 rank correlation 68
mutually exclusive events 186 rectangular (uniform) distribution, continuous 146
179 mean and variance 345
negative correlation
negative skew 119 discrete 347
non-parametric test 84,95 regressi.on, coefficients of 240
605 hmctwn 124, 142
normal approximation to binomial 119
to Poisson 382 least squares lines
distribution 390 calculator 119
89, 360 rejection criteria (rules) 133
goodness of fit test {x 2 ) 513
tables (standard normal) 576 rejection region
usc of 649 485,509
sample mean
null hypothesis (H0 ) 362-377 proportion 436
485,511,566,583,601 445
one-tailed tests sampling distribution of means
or rule, probability 489,511 proportions 436
outlier 183 sampling methods 445
98 duster 424
Pearson's coefficient of skewness design 429
percentile 85 frame 422
permutations 68 quota 429
pie diagrams 214 stratified 423
poi-nt estimates 24 systematic 428
Poisson, approximation to binomial 447 units 427
cumulative probability tables 299 scaling sets of data 423
use of 647 scatter diagram 51
diagrammatic representation 294 significance level 118
distribution 295 tests 485,509
expectation and variance 292 simulating random samples 483,507,560,600
fitting a theoretical distribution 293 skewness 431
goodness of fit test (x 2 ) 296 quartile coefficient of 84
hypothesis test for mean 573 Pearson's coefficient of 88
mode 496 Spearman's rank correlation coefficient 85
normal approximation to 296 significance of 146
sum of two variables 390 table of critical values 605
unit interval 301 standard deviation, discrete random variable 652
pooled two-sample estimate {variance) 293 calculator 249
population 535 frequency distribution 40
positive correlation 421 raw data 41
positive skew 119 stanclard error of mean 37
possibility space 84,95 of proportion 438
power of a test 172 standard normal variable 445
probability 521 cumulative tables 361
aUdition law (or rule) 168 use of 649
arrangements, permutations and combinations 183 stratified sampling 362
Bayes< theorem 206 stem and leaf diagrams (stemplot) 428
complementary event 197 back to back stcmplot 4
conditional events 172 step diagrams 7
density function (p.d.f.), continuous 182 sum of random variables 59
from cumulative distribution 314 normal 256
discrete 341 Poisson 403
distribution 234 survey 301
exhaustive events 233 systematic sampling 422
experimental 180 427
indepenclent events 169 t-distribution
multiplication law (and rule) 185 test statistic 462
mutually exclusive events 186 tied ranks 485,547
subjective 179 tree diagrams 150
trees 171 t-tables 193
193 use of 650
product-moment correlation coefficient 464
significance of 139 t-tests
table of critical values 600 two-tailed tests 524
652 type 1 and type II errors 489, 511
proportion, conficlence interval
dtstribution of sample 469 493,520
unbiasecl estimate
unbiased estimate 445 447
significance test, n large 447 uniform distribution {rectangular), continuous
discrete 345
n small 528 270
483 goodness of fit test
quartile coefficient of skewness unit interval (Poisson distribution) 567
quartiles, ungrouped data 88 upper quartile 293
continuous random nriab!c 69,71 continuous random variable 69, 71, 75
grouped data 336 336
75 variance, from data
quota sampling 38
423 random variables, continuous
random number table discrete 327
usc of 653 unbiased estimate 248
random sampling 425 pooled from two sample 447
from frequency distribution 424 Venn cliagram 535
from probability distribution 431 172, 175
432 weighted mean
random variables, continuous 36
difference between 314 width, confidence interval
interval 457
discrete 257 3
233 Yates' continuity correction
586

You might also like