0% found this document useful (0 votes)
55 views94 pages

Course Notes Part 1 - Chapters 1 To 4

The document defines key terminology used in statistics, including: data, statistics, descriptive statistics, population, census, sample, statistical inference, variable, measurement scales, experiment, parameter, and statistic. It also discusses different sampling methods and the concept of a sampling frame.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views94 pages

Course Notes Part 1 - Chapters 1 To 4

The document defines key terminology used in statistics, including: data, statistics, descriptive statistics, population, census, sample, statistical inference, variable, measurement scales, experiment, parameter, and statistic. It also discusses different sampling methods and the concept of a sampling frame.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

1

Chapter 1 – Terminology

1.1 Definitions
Data/Data set – Set of values collected or obtained when gathering information on some
issue of interest.

Examples

1) The monthly sales of a certain vehicle collected over a period.

2) The number of passengers using a certain airline on various routes.

3) Rating (on a scale from 1 to 5) of a new product by customers.

4) The yields of a certain crop obtained after applying different types of fertilizer.

Statistics – Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting the data and drawing
conclusions from it.

Statistics in the above sense refers to the methodology used in drawing meaningful
information from a data set. This use of the term should not be confused with statistics
(referring to a set of numerical values) or statistics (referring to measures of description
obtained from a data set).

Descriptive Statistics – Collection, organization, summarization and presentation of data.


To be discussed in chapter 2.

Population – All subjects possessing a common characteristic that is being studied.

Examples

1) The population of people inhabiting a certain country.

2) The collection of all cars of a certain type manufactured during a particular month.

3) All patients in a certain area suffering from AIDS.

4) Exam marks obtained by all students studying a certain statistics course.

Census – A study where every member (element) of the population is included.


2

Examples

1) Study of the entire population carried out by the government every 10 years.

2) Special investigations e.g. tax study commissioned by a government.

3) Any study of all the individuals/elements in a population.

A census is usually very costly and time consuming. It is therefore not carried out very often.
A study of a population is usually confined to a subgroup of the population.

Sample – A subgroup or subset of the population.

The number of values in the sample (sample size) is denoted by n. The number of values in
the population (population size) is denoted by N.

Statistical Inference – Generalizing from samples to populations and expressing the


conclusions in the language of probability (chance). To be discussed in chapters 5 – 9.

Variable – Characteristic or attribute that can assume different values

Discrete variables – Variables that can assume a finite or countable number of possible
values. Such variables are usually obtained by counting.

Examples

1) The number of cars parked in a parking lot.

2) The number of students attending a statistics lecture.

3) A person’s response (agree, not agree) to a statement. A one (1) is recorded when
the person agrees with the statement, a zero (0) is recorded when a person does not
agree.

Continuous variables – Variables that can assume an infinite number of possible values.
Such variables are usually obtained by measurement.

Examples

1) The body temperature of a person.

2) The weight of a person.

3) The height of a tree.

4) The contents of a bottle of cool drink.


3

Measurement scales

Qualitative variables – Variables that assume non-numerical values.

Examples

1) The course of study at university (B.Com, B.Eng , BA etc.)


2) The grade (A, B, C, D or E) obtained in an examination.

Nominal scale – Level of measurement which classifies data into categories in which no
order or ranking can be imposed on the data.

A variable can be treated as nominal when its values represent categories with no intrinsic
ranking. For example, the department of the company in which an employee works.
Examples of nominal variables include region, postal code, or religious affiliation.

Ordinal scale – Level of measurement which classifies data into categories that can be
ordered or ranked. Differences between the ranks do not exist.

A variable can be treated as ordinal when its values represent categories with some intrinsic
order or ranking.

Examples

1) Levels of service satisfaction from very dissatisfied to very satisfied.


2) Attitude scores representing degree of satisfaction or confidence and preference
rating scores (low, medium or high).
3) Likert scale responses to statements (strongly agree, agree, neutral, disagree,
strongly disagree).

Quantitative variables – Variables which assume numerical values.

Examples
Discrete and continuous variables examples given above.

Interval scale – Level of measurement which classifies data that can be ordered and ranked
and where differences are meaningful. However, there is no meaningful zero and ratios are
meaningless.

Examples

1) The difference between a temperature of 100 degrees and 90 degrees is the same
difference as that between 90 degrees and 80 degrees. Taking ratios in such a case
does not make sense.

2) When referring to dates (years) or temperatures measured (degrees Fahrenheit or


Celsius) there is no natural zero point.
4

Ratio scale – Level of measurement where differences and ratios are meaningful and there
is a natural zero. This is the “highest” level of measurement in terms of possible operations
that can be performed on the data.

Examples

Variables like height, weight, mark (in test) and speed are ratio variables. These variables
have a natural zero and ratios make sense when doing calculations e.g. a weight of 80
kilograms is twice as heavy as one of 40 kilograms.

Summary of 4 measurement scales

Measurement examples Meaningful calculations


scale
Nominal Types of music Put into categories
University faculties
Vehicle makes
Ordinal Motion picture ratings: Put into categories
G- General audiences Put into order
PG-Parental guidance
PG-13 – Parents cautioned
R - Restricted
NC 17 – No under 17
Interval Years: 2009,2010, 2011 Put into categories
Months: 1,2, . . . , 12 Put into order
Differences between values are
meaningfull
Ratio rainfall Put into categories
humidity Put into order
income Differences between values are
meaningfull
Ratios are meaningfull

Experiment – The process of observing some phenomenon that occurs.

An experiment can be observational or designed.

1) A designed experiment can be controlled to a certain extent by the experimenter.


Consider a study of 4 fuel additives on the reduction in oxides of nitrogen. You may
have 4 drivers and 4 cars at your disposal. You are not particularly interested in any
effects of particular cars or drivers on the resultant oxide reduction. However, you
do not want the results for the fuel additives to be influenced by the driver or car. An
appropriate design of the experiment (way of performing the experiment) will allow
you to estimate effects of all factors of interest without these outside factors
influencing the results.
5

2) An observational study is not controlled by the experimenter. The characteristic of


interest is simply observed and the results recorded. For example

2.1) Collecting data that compares reckless driving of female and male drivers.
2.2) Collecting data on smoking and lung cancer.

Parameter – Characteristic or measure of description obtained from a population.

Examples

1) Mean (average) age of all employees working at a certain company.

2) The proportion of registered female voters in a certain country.

Statistic – Characteristic or measure of description obtained from a sample.

Examples

1) The mean (average) monthly salary of 50 selected employees in a certain


government department.

2) The proportion of smokers in a sample of 60 university students.

1.2 Sampling methods


When selecting a sample, the main objective is to ensure that it is as representative as
possible of the population it is drawn from. When a sample fails to achieve this objective, it
is said to be biased.

Sampling frame (synonyms: "sample frame", "survey frame") – This is the actual set of units
from which a sample is drawn

Example

Consider a survey aimed at establishing the number of potential customers for a new
service in a certain city. The research team has drawn 1000 numbers at random from a
telephone directory for the city, made 200 calls each day from Monday to Friday from 8am
to 5pm and asked some questions.

In this example, the population of interest is all the inhabitants in the city. The sampling
frame includes only those city dwellers that satisfy all the following conditions:
6

1) They have a telephone.

2) The telephone number is included in the directory.

3) They are likely to be at home from 8am to 5pm from Monday to Friday;

4) They are not people who refuse to answer telephone surveys.

The sampling frame in this case definitely differs from the population. For example, it under-
represents the categories which either have no telephone (e.g. the most poor), have an
unlisted number, and who were not at home at the time of calls (e.g. employed people),
who don't like to participate in telephone interviews (e.g. more busy and active people).
Such differences between the sampling frame and the population of interest is a main cause
of bias when drawing conclusions based on the sample.

Probability samples – Samples drawn according to the laws of chance. These include simple
random sampling, systematic sampling and stratified random sampling.

Simple random sampling – Sampling in which each sample of a given size that can be drawn
will have the same chance of being drawn. Most of the theory in statistical inference is
based on random sampling being used.

Examples

1) The 6 winning numbers (drawn from 49 numbers) in a Lotto draw. Each potential
sample of 6 winning numbers has the same chance of being drawn.

2) Each name in a telephone directory could be numbered sequentially. If the sample


size was to include 2 000 people, then 2 000 numbers could be randomly generated
by computer or numbers could be picked out of a hat. These numbers could then be
matched to names in the telephone directory, thereby providing a list of 2 000
people.

A random sample can be selected by using a table of random numbers.

Example

Suppose the first 6 random numbers in the table of random numbers are:
10480, 22368, 24130, 42167, 37570, 77921.
Use these numbers to select the 6 wining numbers in a Lotto draw.

The 49 numbers from which the draw is made all involve 2 digits i.e. 01, 02, . . . , 49.
Putting the above numbers from the table of random numbers next to each other in a string
of digits gives: 10 48 02 23 68 24 13 04 21 67 37 57 07 79 21 .
7

The winning numbers can be selected by either taking all pairs of digits between 01 and 49
(discarding any numbers outside this range or repeats) by working from left to right or right
to left in the above string.

By working from left to right the winning numbers are: 10, 48, 2, 23, 24 and 13.
By working from right to left the winning numbers are: 21, 7, 37, 21, 4 and 13.

The advantage of simple random sampling is that it is simple and easy to apply when small
populations are involved. However, because every person or item in a population has to be
listed before the corresponding random numbers can be read, this method is very
cumbersome to use for large populations and cannot be used if no list of the population
items is available. It can also be very time consuming to try and locate every person included
in the sample. There is also a possibility that some of the persons in the sample cannot be
contacted at all.

Systematic sampling – Sampling in which data is obtained by selecting every kth object,
N
where k is approximately .
n

Examples

1) A manufacturer might decide to select every 20th item on a production line to test
for defects and quality. This technique requires the first item to be selected at
random as a starting point for testing and, thereafter, every 20th item is chosen.

2) A market researcher might select every 10th person who enters a particular store,
after selecting a person at random as a starting point; or interview occupants of
every 5th house in a street, after selecting a house at random as a starting point.

3) A systematic sample of 500 students is to be selected from a university with an


enrolled population of 10 000. In this case the population size N=10 000 and the
10000
sample size n=500. Then every = 20th student will be included in the sample.
500
The first student in the sample can be randomly selected from an alphabetical list of
students and thereafter every 20th student can be selected until 500 names have
been obtained.

Stratified random sampling – Sampling in which the population is divided into groups
(called strata) according to some characteristic. Each of these strata is then sampled using
random sampling.
8

A general problem with random sampling is that you could, by chance, miss out a particular
group in the sample. However, if you subdivide the population into groups, and sample from
each group, you can make sure the sample is representative. Some examples of strata
commonly used are those according to province, age and gender. Other strata may be
according to religion, academic ability or marital status.

Example

In a study investigating the expenditure pattern of consumers, they were divided into low,
medium and high income groups.

Income group percentage of population


low 40
medium 45
high 15

A stratified sample of 500 consumers is to be selected for this study.

When sampling is proportional to size (an income group comprises the same percentage of
the sample as of the population) the sample sizes for the strata should be calculated as
follows.

40 * 500 45 * 500 15 * 500


low :  200 , medium :  225 , high :  75.
100 100 100

Convenience Sampling – Sampling in which data that is readily available is used e.g. surveys
done on the internet. These include quota sampling.

Quota sampling – Quota sampling is performed in 4 stages.

a) Stage 1: Decide which characteristics of the elements/individuals in the population


to be sampled are of importance.
b) Stage 2: Decide on the categories to be sampled from. These categories are
determined by cross-classification according to the characteristics chosen at stage 1.
c) Stage 3: Decide on the overall number (quota) and numbers (sub-quotas) to be
sampled from each of the categories specified in step 2.
d) Stage 4: Collect the information required until all the numbers (quotas) are
obtained.
9

Example

A company is marketing a new product and needs to know how potential customers might
react to the product.

Stage 1: It is decided that age (the 3 groups under 20, 20-40, over 40) and gender
(male, female) are the characteristics that will determine the sample.

Stage 2: The 6 categories to be sampled from are (male under 20), (male 20-40),
(male over 40), (female under 20), (female 20-40) and (female over 40).

Stage 3: The numbers (sub-quotas) to be sampled are (male under 20) - 40,
(male 20-40) - 60, (male over 40) - 25, (female under 20) - 35, (female 20-40) - 65
and (female over 40) -30. The total quota is the total of all the sub-quotas i.e. 255.

Stage 4: Visit a place where individuals to be interviewed are readily available e.g. a
large shopping center and interview people until all the quotas are filled.

Quota sampling is a cheap and convenient way of obtaining a sample in a short space of
time. However, this method of sampling is not based on the laws of chance and cannot
guarantee a sample that is representative of the population from which it is drawn.

When obtaining a quota sample, interviewers often choose who they like (within criteria
specifications) and may therefore select those who are easiest to interview. Therefore
sampling bias can result. It is also impossible to estimate the accuracy of quota sampling
(because sampling is not random).
10

Chapter 2 – Descriptive Statistics


(Exploratory Data Analysis)

All the data sets used in this chapter will be regarded as samples drawn from some
population. One of the main purposes of studying a sample is to get information about the
population. The main focus here is on summarizing and describing some features of the
data.

2.1 Graphs and diagrams


Line graph – A line graph is a graph used to present some characteristic recorded over time.
Example

The graph above shows how a person's weight varied from the beginning of 1991 to the
beginning of 1995.

Bar charts

A bar chart or bar graph is a chart consisting of rectangular bars with heights proportional to
the values that they represent. Bar charts are used for comparing two or more values that
are taken over time or under different conditions.

Simple Bar Chart

In a simple bar chart the figures used to make comparisons are represented by bars. These
are either drawn vertically or horizontally. Only totals are represented. The height or length
11

of the bar is drawn in proportion to the size of the figure being presented. An example is
shown below.

Component Bar Chart

When you want to draw a bar chart to illustrate your data, it is often the case that the totals
of the figures can be broken down into parts or components.

Year Total Male Female


1959 51 956 000 25 043 000 26 913 000
1969 55 461 000 26 908 000 28 553 000
1979 56 240 000 27 373 000 28 867 000
1989 57 365 000 27 988 000 29 377 000
1999 59 501 000 29 299 000 30 202 000

You start by drawing a simple bar chart with the total figures as shown above. The columns
or bars (depending on whether you draw the chart vertically or horizontally) are then
divided into the component parts.
12

Multiple (compound) Bar Chart


You may find that your data allows you to make comparisons of the component
figures themselves. If so, you will want to create a multiple (compound) bar chart.
This type of chart enables you to trace the trends of each individual component, as
well as making comparisons between the components.

Pareto chart
A Pareto chart is a special type of bar chart where the values being plotted are
arranged in descending order. The graph is accompanied by a line graph which shows
the cumulative totals of each category, left to right.

The graph below is a Pareto chart that shows the percentage of late arrivals at a
place of work organized according to cause of late arrival (from the most common to
the least common cause). The line shows the accumulated percentages.
100 100%

80 80%

60 60%
Percent
percent

40 40%

20 20%
33
25
17 15
8
0 2 0%
traffic child care transport weather overslept emergency
reason
13

Dot Plot

This is diagram where a line is drawn according to a scale that is appropriate for the data set
and the values (in the data set) plotted at their positions on the scale. If the same value
occurs more than once, the multiple values are plotted on top of each other at the same
point on the scale. For small data sets (few values) this plot can provide useful information
regarding data patterns.

Example
Imagine that a medium-sized retailer, thinking of expanding into a new region
identifies a business that it considers as being ready for takeover. It finds the
following annual profit figures (in tens of thousands of pounds) for the target
retailer's last ten years trading:

9 9 7 7 7 6 5 4 3 3

To draw a dot plot we can begin by drawing a horizontal line across the page to
represent the range of values of all the numbers; then we can mark an 'x' above the
appropriate value along the line as follows:

Pie Chart

A Pie chart is a diagram that shows the subdivision of some entity/total into subgroups. The
diagram is in the form of a circle which is divided into slices with each slice having an area
according to the proportion that it makes up of the total.

Example
The pie chart below shows the ingredients used to make a sausage and mushroom pizza.
14

The degrees needed for each slice is found by calculating the appropriate percentage of 360
e.g. for sausage the degrees are 0.125×360 = 45 and for cheese 0.25×360 =90 etc.
The complete calculations are shown in the table below.

Ingredient Percentage Degrees


Sausage 7.5 0.075 × 360 = 27
Cheese 25 0.250 × 360 = 90
Crust 50 0.50 × 360 = 180
Tomato sauce 12.5 0.125 × 360 = 45
Mushrooms 5 0.050 × 360 = 18

Stem-and-leaf plot
A stem-and-leaf plot is a device used for summarizing quantitative data in a
table/graphical format to assist in visualizing the shape of a data set.

Examples

1) To construct a stem-and-leaf plot, the values must first be sorted in ascending order.
Here is the sorted set of data values that will used in the example:

44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

Next, it must be determined what the stems will represent and what the leaves will
represent. Typically, the leaf contains the last digit of the number and the stem
contains all of the other digits. In the case of very large or very small numbers, the
data values may be rounded to a particular place value (such as the hundredths
place) that will be used for the leaves. The remaining digits to the left of the rounded
place value are used as the stems.

In this example, the leaf represents the “ones” place and the stem the rest of the
number (“tens” place or higher).

The stem-and-leaf plot is drawn with two columns separated by a vertical line. The
stems are listed to the left of the vertical line. It is important that each stem is listed
only once and that no numbers are skipped, even if it means that some stems have
no leaves. The leaves are listed in increasing order in a row to the right of each stem.

4 |4679
5 |
6 |34688
7 |2256
8 |148
9 |
10 | 6
15

key: 5|4=54
leaf unit: 1.0
stem unit: 10.0

Conclusion: The 12 of the 17 values are greater or equal to 63 and less or


equal to 88.

2) Two data sets can be compared by drawing a back-to-back stem-and-leaf plot.

As an example, suppose the fat contents (in grams) for eating English breakfasts and
cold meat sandwiches are to be compared. The fat contents are shown below.

Sandwiches: 6, 7, 12, 13, 17, 18, 20, 21, 21, 24, 26, 28, 30, 34

Breakfasts: 12, 14, 15, 16, 18, 23, 25, 25, 36, 36, 38, 41, 44, 45

A back-to-back stem-and-leaf plot is shown below.

Breakfasts Sandwiches
|0| 6 7
2 4 5 6 8 |1| 2 3 7 8
3 5 5 |2| 0 1 1 4 6 8
6 6 8 |3| 0 4
1 4 5 |4|

key: 2|4=24 for sandwiches and 2|4=42 for breakfasts


leaf unit: 1.0
stem unit: 10.0

Conclusion: The fat content in English breakfasts appears to be higher than that in
sandwiches.

2.2 Sigma and subscript notation


The symbol sigma ∑ (Capital S in Greek alphabet) is used to denote “the sum of” values.

Suppose the symbol x is used to denote some variable of interest in a study. In order to
distinguish between values of this variable, subscripts are used.

x1 – first value in the data set which has a subscript 1.


x2 – second value in the data set which has a subscript 2.
.
.
xn – nth value in the data set which has a subscript n.
16

The sum of these values is written in shorthand notation as

n
x1 + x2 + . . . + xn = x .
i 1
i

If it is understood that the range of subscript indices over which the summation is taken
involves all the x values, the summation can be written as just

x1 + x2 + . . . + xn = x.
Example 1: Suppose x1 = 70, x2 = 74, x3 = 66, x4 = 68, x5 = 71. Then

x
i 1
i = x1+x2+ . . . + x5 = 70+74+66+68+71 = 349.

The sum of the squares of a set of values are written as

x = x12  x22    xn2 or x


2 2
i for short.
i 1

Example 2: For the data set in example 1,


5

x
i 1
2
i = 702 + 742 + 662 + 682 + 712 = 24397.

x
n
 (  xi ) 2
2
Note that i
i 1 i 1
5
e.g. for the abovementioned data x
i 1
2
i = 24397  349 2 = 121801.

The summation notation can also be used to write the sum of products of corresponding
values for 2 different sets of values.

x y
i 1
i i = x1 y1  x 2 y 2    x n y n

Example: Consider the following values.

i 1 2 3 4 5 6
xi 11 13 7 12 10 8
yi 8 5 7 6 9 11
17

For this data


6

x y
i 1
i i = (11×8) + (13×5) + (7×7) + (12×6) + (10×9) + (8×11)

= 88 + 65 + 49 + 72 + 90 + 88
= 452.

n n n
Note that  xi yi
i 1
 (  xi ) (  y i )
i 1 i 1
6
e.g. for the abovementioned data x
i 1
i = 61 and
6 6 6 6

 y i = 46 (  x i ) (  y i ) = 2806 
i 1 i 1 i 1
x y .
i 1
i i

The summation notation is used extensively in specifying calculations in statistical formulae.

2.3 Frequency distributions and related graphs


Frequency distribution

A frequency distribution is a table in which data are grouped into classes and the number of
values (frequencies) which fall in each class recorded.
The main purpose of constructing a frequency distribution is to get insight into the
distribution pattern of the frequencies over the classes. Hence, the name frequency
distribution is used to refer to this pattern.

Example 1
In a survey of 40 families in a village, the number of children per family was recorded and
the following data obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5

number of children Tally frequency (f)


0 ||| 3
1 ||||| || 7
2 ||||| | | | | | 10
3 ||||| ||| 8
4 ||||| | 6
5 |||| 4
6 || 2
Total 40
18

Note: The sum of the frequencies = sample size i.e. f  n.

Example 2
Consider the following data of low temperatures (in degrees Fahrenheit to the nearest
degree) for 50 days. The highest temperature is 64 and the lowest temperature is 39.

Data Set - Low Temperatures for 50 Days


57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 64 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41

Constructing a frequency distribution

The classes into which the above values can be sorted can be found by following the steps
shown below.

1. Find the maximum (=64) and minimum (=39) values and calculate the

range = maximum – minimum = 64 – 39 = 25.

2. Decide on the number of classes. Use Sturges’ rule which states that

No. of classes = k
= the rounded up value of (1 + 1.44 ln n)
= 1 + 1.44 × ln(50)
= 6.63
i.e. k = 7.

3. Calculate the class width such that no. of classes × class width > range

i.e. 7× class width > 25.

This suggests a class width of 4.


19

4. Find the lower value that defines the first class. This is usually a value just below the
minimum value in the data set. Since the minimum value for this data set is 39, the
lowest class can have a minimum value one below this i.e. 38.

5. Find the lower values that define each of the classes that follow by successively adding
the class width to the lower value of class.

lower value of the second class = 38 + 4 = 42.

lower value of the third class = 42 + 4 = 46 etc.

The frequency distribution below shows the data values sorted into the classes

38 – 41, 42 – 45, 46 – 49, 50 – 53, 54 – 57, 58 – 61, 62 – 65

The table below shows the classes and their frequencies for the temperatures data set.

class limits f
38 – 41 4
42 – 45 10
46 – 49 8
50 – 53 15
54 – 57 9
58 – 61 3
62 – 65 1
Total 50

The values in the above example that define the classes of the frequency distribution are
called class limits. The classes of the type 38 – 41, 42 – 45,… in which both the upper and
lower limits are included are called “ inclusive classes” . For example, the class 38 – 41
includes all the values from 38 to 41.
In spite of great importance of classification in statistical analysis, no hard and fast rules can
be laid down for it.
The following points must be kept in mind for classification:

1) The classes should be clearly defined and should not lead to any ambiguity.
2) Each of the given values in the data set should be included in one of the classes.
3) The classes should be of equal width, otherwise the different class frequencies will
not be comparable. If the class widths are unequal, then comparable figures can
20

be obtained by dividing the value of the frequencies by the corresponding widths


of the class intervals. The ratios thus obtained are called ‘ frequency density’.
4) The number of classes should not be too large nor too small.

Continuous Frequency Distribution

If we deal with a continuous variable, it is not possible to arrange the data in the class
intervals of above type. Let us consider the distribution of age in years. If class intervals are
15 – 19, 20 – 24 then persons with ages between 19 and 20 years are not taken into
consideration. In such a case we form the class intervals as 0 – 5, 5 – 10, 10 – 15,
15 – 20,…… Here all the persons with any fraction of age are included in one group or the
other. In the above classes, the upper limits of each class are excluded from the respective
classes and are included in the immediate next class and are known as ‘exclusive classes’.
The upper and lower class limits of the new exclusive type classes are known as class
boundaries.

If d is the gap between the upper limit of any class and the lower limit of the succeeding
class, the class boundaries for any class are then given by :

Upper class boundary = upper class limit + (d/2)


Lower class boundary = Lower class limit – (d/2)

Example 2 continued (temperature data)


The frequency distribution below includes the class boundaries.

class limits class boundaries f


38 – 41 37.5 – 41.5 4
42 – 45 41.5 – 45.5 10
46 – 49 45.5 – 49.5 8
50 – 53 49.5 – 53.5 15
54 – 57 53.5 – 57.5 9
58 – 61 57.5 – 61.5 3
62 – 65 61.5 – 65.5 1
Total 50

Example 3
The monthly expenditures (thousands of rands) of 60 households are shown on the next
page. The values of this data set were accurately recorded (not rounded).
21

7.21741 7.8989 6.85461 10.31167 8.48253 5.17069


5.09063 8.16412 5.67094 7.7394 7.87423 5.41634
9.37265 10.14436 7.15675 10.31107 8.86571 10.1734
5.99276 6.5738 7.06965 8.82439 7.47467 9.50018
4.90014 5.50273 8.12516 5.51933 7.43641 10.95599
5.87188 9.36936 9.83773 10.18893 5.12028 9.60018
8.56534 9.27719 8.37107 7.03318 10.78344 9.08941
6.85749 7.7887 9.68159 6.75009 8.0521 8.19638
10.17312 7.51527 11.31383 8.5765 7.48021 8.39881
7.37565 7.28159 8.81773 5.53182 5.98515 7.71778

The frequency distribution shown below is a summary of this data set.

classes f
4.5 – 5.5 5
5.5 – 6.5 7
6.5 – 7.5 13
7.5 – 8.5 13
8.5 – 9.5 9
9.5 – 10.5 10
10.5 – 11.5 3
Total 60

For this distribution lower (upper) class limit = lower (upper) class boundary for each of the
classes.
A value that falls on the boundary of 2 classes is allocated to the higher of the two classes
e.g. 5.50000 is allocated to the class 5.5 – 6.5 (not 4.5 to 5.5).

Class midpoints

The midpoint of class (xmid) can be calculated from

Examples

1) For the frequency distribution in example 2 (temperature data), the class midpoints
are given on the following page.
22

class limits class boundaries f midpoints


38 – 41 37.5 – 41.5 4 39.5
42 – 45 41.5 – 45.5 10 43.5
46 – 49 45.5 – 49.5 8 47.5
50 – 53 49.5 – 53.5 15 51.5
54 – 57 53.5 – 57.5 9 55.5
58 – 61 57.5 – 61.5 3 59.5
62 – 65 61.5 – 65.5 1 63.5

2) For the frequency distribution in example 3 (expenditure data), the class midpoints are
given below.

classes midpoints
4.5 – 5.5 5
5.5 – 6.5 6
6.5 – 7.5 7
7.5 – 8.5 8
8.5 – 9.5 9
9.5 – 10.5 10
10.5 – 11.5 11

Cumulative frequencies

The “less than” cumulative frequency of a class is the number of values in the sample that
are less than or equal to the upper class boundary of the class.

Examples

1) For the frequency distribution in example 2 (temperature data) the cumulative


frequencies are calculated as shown below.

class cumulative
f calculations
boundaries frequency
37.5 – 41.5 4 4 4
41.5 – 45.5 10 14 4+10
45.5 – 49.5 8 22 4+10+8
49.5 – 53.5 15 37 4+10+8+15
53.5 – 57.5 9 46 4+10+8+15+9
57.5 – 61.5 3 49 4+10+8+15+9+3
61.5 – 65.5 1 50 4+10+8+15+9+3+1
23

2) For the frequency distribution in example 3 (expenditure data) the cumulative


frequencies are calculated as shown below.

cumulative
classes f calculations
frequencies
4.5 – 5.5 5 5 5
5.5 – 6.5 7 12 5+7
6.5 – 7.5 13 25 5+7+13
7.5 – 8.5 13 38 5+7+13+13
8.5 – 9.5 9 47 5+7+13+13+9
9.5 – 10.5 10 57 5+7+13+13+9+10
10.5 – 11.5 3 60 5+7+13+13+9+10+3
Total 60

Relative and percentage frequencies


f
 Relative frequency = frequency/sample size i.e. Rf = .
n
 The percentage frequency of a class is calculated from relative frequency × 100.

Examples

1) The relative and percentage frequencies for the frequency distribution in example 2
(temperature data) are shown below.

class boundaries f relative frequency percentage frequency


37.5 – 41.5 4 0.08 8
41.5 – 45.5 10 0.2 20
45.5 – 49.5 8 0.16 16
49.5 – 53.5 15 0.3 30
53.5 – 57.5 9 0.18 18
57.5 – 61.5 3 0.06 6
61.5 – 65.5 1 0.02 2

2) The relative and percentage frequencies for the frequency distribution in example 3
(expenditure data) is shown on the following page.
24

relative percentage
classes f
frequency frequency
4.5 – 5.5 5 0.083 8.3
5.5 – 6.5 7 0.117 11.7
6.5 – 7.5 13 0.217 21.7
7.5 – 8.5 13 0.217 21.7
8.5 – 9.5 9 0.15 15
9.5 – 10.5 10 0.167 16.7
10.5 – 11.5 3 0.05 5
Total 60 1 100

Histogram

A histogram is the graphical representation of a frequency distribution. The


frequency for each class is represented by a rectangular bar with the class
boundaries as base and the frequency as height.

Example

A histogram of the frequency distribution in example 2 (temperature data) is shown below.

16
14
12
frequency

10
8
6
4
2
0
37.5-41.5 41.5-45.5 45.5-49.5 49.5-53.5 53.5-57.5 57.5-61.5 61.5-65.5
temperature
25

Frequency polygon

This is also a graphical representation of a frequency distribution. For each class the
class midpoint is plotted against the frequency and the plotted points joined by
means of straight lines.

Example

For the temperature data the following values are plotted.

midpoint 35.5 39.5 43.5 47.5 51.5 55.5 59.5 63.5 67.5
f 0 4 10 8 15 9 3 1 0

The plot is shown below.

16
14
12
frequency

10
8
6
4
2
0
0 10 20 30 40 50 60 70 80
midpoint

Note:
The two plotted values at the lower and upper ends were added to anchor the graph to the
horizontal axis. The lower end value is a plot of 0 versus the midpoint of the class below the
first (lowest) class (35.5). This midpoint is obtained by subtracting the class width (4) from
the midpoint of the lowest class (39.5). The upper end value is a plot of 0 versus the
midpoint of the class above the last class (67.5). This midpoint is obtained by adding the
class width (4) to the midpoint of the last (highest) class (63.5).

The histogram and frequency polygon are equivalent graphical representations of the
pattern of the frequencies shown in the frequency distribution.

The the histogram can provide an estimate of the probability (chance) that a value drawn at
random from the data set will lie between two values.
26

Examples

1) For the frequency distribution in example 2 (temperature data), the estimated


chance that a randomly drawn value will be at least 45.5 but less than 57.5 is
8  15  9
 0.64.
50

2) For the frequency distribution in example 3 (monthly expenditure), the estimated


13  9  10  3
chance that a randomly drawn value will be at least 7.5 is  0.583.
60

“Less than” ogive


This is the graph of the cumulative frequencies versus the upper class boundaries.

Example

For the “less than” ogive of the frequency distribution in example 2 (temperature data)

the following values are plotted.

class boundary 37.5 41.5 45.5 49.5 53.5 57.5 61.5 65.5
cumulative
0 4 14 22 37 46 49 50
frequency
27

cumulative frequency

60

50

40
Cum. frequency

30

20

10

0
0 10 20 30 40 50 60 70
class boundary

Note:
The plotted value at the lower end was added to anchor the graph to the horizontal axis.
The lower end value is a plot of 0 versus the upper class boundary of the class below the
first (lowest) class (37.5). This upper class boundary is obtained by subtracting the class
width (4) from the upper class boundary of the lowest class (41.5).

A percentage “less than” ogive can be plotted by just changing the vertical scale. In this
example the frequencies add up to 50. In order to convert these frequencies to percentages,
each frequency is multiplied by 2. To draw the percentage ogive, each cumulative frequency
in the above table will have to be multiplied by 2. The resulting graph is shown on the
following page. Values that have a given percentage of the observations in the data set less
than it can be read off from the ogive.

120

100
% cumulative freq

80

60

40

20

0
0 10 20 30 40 50 60 70
boundaries
28

The shape of a distribution

The main purpose of drawing a histogram is to describe the clustering pattern of the values
in the data set. For a large sample size, the histogram (frequency polygon) can be fairly well
approximated by a smooth curve (called a frequency curve) that is fitted to the frequencies.
The following patterns of the shape of the frequency curve appear regularly in data sets.

Symmetric bell shape

0.45

0.4

0.35

0.3
frequency

0.25

0.2

0.15

0.1

0.05

0
-4 -2 0 2 4
x

This shape is for data sets where the majority of values are in the central portion of the
scale with fewer and fewer values the further away from the center (in both directions).
Many data sets have this shape. Examples are

1) Marks obtained in an examination.


2) Heights of a large group of adult males.
3) IQ scores in a large population.

Uniform (rectangular) shape

0.12

0.1

0.08
frequency

0.06

0.04

0.02

0
0 1 2 3 4 5 6
x
29

This shape occurs when all the values in the data set occur approximately the same number
of times. Examples are

1) Frequencies of winning numbers in a large number of Lotto draws.

2) Frequencies of winning numbers in a large number of roulette games.

3) Frequencies obtained when tossing an unbiased coin and recording 0 if tails come up and 1
if heads come up.

Bimodal shape
60

50

40
frequency

30

20

10

0
0 20 40 60 80 100 120
Body length (m m )

This pattern which shows two distinct peaks (hence the name bimodal data) appearing
when there are two subgroups with different sets of values in the same data set.

Examples

1) Measuring the body lengths of ants when there are adults and juveniles together in
the same data set. The two peaks in the curve reflect the fact that juvenile ants have
shorter body lengths than adult ants.

2) Heights of a population of males and females. Since the females are shorter than the
males, the frequency curve will have two peaks. One peak will be located where the
most female heights are concentrated and one where the most male heights are
concentrated.
30

Positive skew shape

1.2

0.8

frequency 0.6

0.4

0.2

0
0 2 4 6 8 10 12 14
x

This shape shows a high clustering of values at the lower end of the scale and less and less
clustering further away from the lower end towards the upper end.

Example
The time it takes to serve a customer at a supermarket. For most customers the service time
is quite short. The longer the service time, the less the number of customers.

Negative skewed shape

0.3

0.25

0.2
frequency

0.15

0.1

0.05

0
0 2 4 6 8 10 12 14 16
-0.05
x

This shape shows a high clustering of values at the upper end of the scale and less and less
clustering further away from the upper end towards the lower end.

Example
Marks in a test where most students did well, but a few performed poorly.
31

2.4 Measures of central tendency (location)


A measure of central tendency is a value that shows the location on the scale where a data
set is centrally located (most values are clustered around it).

In the calculations a distinction will be made between methods used when the data are in
raw form (values as collected) or grouped form (form of a frequency distribution).

2.4.1 Raw data: The mean (average), median and


mode
Mean:
The mean (or average) of a set of data values is the sum of all of the data values in
the set divided by the n the number of data values. That is

1
mean = x 
n
x.
x is pronounced “x bar”.

Example
The marks of seven students in a mathematics test with a maximum possible mark of 20 are
given below:
15 13 18 16 14 17 12:

mean = x 
x 
15  13  18  16  14  17  12
= 15.
n 7

Median:
The median is the value in the data set which is such that half of the values in the
data set are less than or equal to it and half greater than or equal to it.

For an odd number of values in the data set, the median is the middle value of the
data set when it has been arranged in ascending order. That is, from the smallest
value to the largest value.

If the number of values in the data set is even, then the median is the average of the
two middle values.
32

Examples

1) The marks of nine students in a geography test that had a maximum possible mark of 50
are given below:

47 35 37 32 38 39 36 34 35

Find the median of this set of data values.

Arrange the data values in order from the lowest value to the highest value:

32 34 35 35 36 37 38 39 47

2) Consider the above data set with the first value (47) omitted.

Arrange the data values in order from the lowest value to the highest value:

32 34 35 35 36 37 38 39

In this case the number of values n = 8 which is an even number. The two middle values in
n 8 n
the data set are in positions   4 and  1  5 i.e. the values 35 and 36.
2 2 2

35  36
Median =  35.5.
2

Mode:
The mode of a set of data values is the value(s) that occurs most often.
Example:
Find the mode of the following data set:
48 44 48 45 42 49 48
The mode is 48 since it occurs most often.

Note

1) It is possible for a set of data values to have more than one mode.
2) If there are two data values that occur most frequently, we say that the set of data
values is bimodal e.g. the data set 2 2 4 5 5 6 has two modes (2 and 5).
3) If no value in the data set occurs more than once, it has no mode e.g. the data set 4
5 7 9 has no mode.
33

Comparison of mean, median and mode

1) The mean is used as a measure of central tendency for symmetrical, bell-shaped data
that do not have extreme values (extreme values are called outliers).

2) The median may be more useful than the mean when there are extreme values in the
data set as it is not affected by the extreme values.

3) The mode is useful when the most common item, characteristic or value of a data set is
required.

Examples

1) The amounts (thousands) for which each of 7 properties were sold are shown below.

280, 390, 412, 555, 698, 725, 2 350

For this data set mean = x = 772.86. This value of the mean is not a central value for
the data set (it is greater than all the values but the largest one). The reason for this
is that the last value (2350) has a considerable influence on the value of the mean.

The median = 555 is a value that more centrally located than the mean. Unlike the
mean, the median is not influenced by the large last values in the data set.

2) For qualitative (non-numerical) data only the mode can be calculated. For example,
suppose 10 rate payers are asked whether they think the percentage increase in
rates is reasonable. They can either agree (A), disagree (D) or be neutral (N) on the
issue. Their responses are shown below.

A, A, D, N, D, A, D, D, N, N.

For this data set the modal response is D (since D occurs more times than the other
responses). It is not possible to calculate a median or a mean for this data set.

The weighted mean

When calculating the mean for raw data, it is usually assumed that all the values in the data
set are equally important. If the values are not all considered equally important, the
weighted mean ( x w ) is calculated according to the formula below.

In the formula x1, x2, . . . , xr are the values and w1, w2, . . . ,wr their respective weights.
34

Example

The final mark (percentage) in a certain course is based on an assignment mark (which
counts for 10% of the final mark), a test mark (which counts for 30% of the final mark) and
an exam mark (which counts for 60% of the final mark). Calculate the final mark of a student
who gets a 65% assignment mark, a 70% test mark and a 55% exam mark.

Solution:
The above formula is applied with
x1= 65, x2= 70 x3= 55,
w1= 10, w2= 30 w3= 60.

65 *10  70 * 30  55 * 60 6050
xw    60.5.
10  30  60 100

2.4.2 Grouped data


Mean:
For grouped data the mean is calculated from the formula below.

where xmid(i) is the midpoint of the ith class, k the number of classes and n the sample size.
This formula is a special case of the weighted mean formula with wi = fi and
k

w
i 1
i  n.

Example

For the frequency distribution of temperatures (example 2 of the frequency distributions),


the mean can be calculated as shown below.

Class boundaries xmid(i) fi xmid(i) fi


37.5 – 41.5 39.5 4 158
41.5 – 45.5 43.5 10 435
45.5 – 49.5 47.5 8 380
49.5 – 53.5 51.5 15 772.5
53.5 – 57.5 55.5 9 499.5
57.5 – 61.5 59.5 3 178.5
61.5 – 65.5 63.5 1 63.5
Total 50 2487

2487
mean =  49.74.
50
35

2.5 Measures of variability (variation,


spread, dispersion)
Variability refers to the extent to which the values in a data set vary around (differ from)
the associated measure of central tendency.

Example

The performance of 2 different stocks is monitored over a period of 8 days. Their values are
shown in the table below.

day 1 2 3 4 5 6 7 8
A 103 120 112 108 130 106 120 112
B 112 97 85 123 153 85 146 110

The dot plot that follows shows the performance of each stock.

The mean values for the two stocks are the same (=113.875), but they differ in variability
(extent of spread around the mean). Stock B has a far wider spread around the mean than
stock A.
36

2.5.1 Raw data


Range: range = (maximum value in data set) – (minimum value in data set)

Example: For the stocks data sets


Range for stock A = 130 – 103 = 27
Range for stock B = 153 – 85 = 68

The larger (wider) spread in the stock B values is reflected in the larger range (more
than twice that of stock A).

Standard deviation and variance

The sample variance (denoted by S2 ) is a measure of variability based on squared


differences between the values in the data set and the mean.

The variance is expressed in the data units squared.


The standard deviation = S = S 2 , which is the positive square root of the variance,
is expressed in the same units as the data.

Example
For stock A the standard deviation is calculated as follows.

x = score A x2
103 10609
120 14400
112 12544
108 11664
130 16900
106 11236
120 14400
112 12544
sum 911 104297
37

For stock B the standard deviation is 25.682 (check this using STATMODE).

Interpretation: The stock A values differ (on average) from the mean by 8.919, while stock
B values differ (on average) from the mean by almost 3 times this amount.

2.5.2 Grouped data

Standard deviation and variance

For grouped data, the raw data formulae for the variance and standard deviation can
be slightly modified.

As before standard deviation = S = S2.

Example

For the frequency distribution of temperatures (example 2 of the frequency distributions),


the variance and standard deviation can be calculated as shown below.

class boundaries xmid(i) fi xmid(i)fi xmid(i)2fi


37.5 – 41.5 39.5 4 158 6241
41.5 – 45.5 43.5 10 435 18922.5
45.5 – 49.5 47.5 8 380 18050
49.5 – 53.5 51.5 15 772.5 39783.75
53.5 – 57.5 55.5 9 499.5 27722.25
57.5 – 61.5 59.5 3 178.5 10620.75
61.5 – 65.5 63.5 1 63.5 4032.25
Total 50 2487 125372.5

125372.5  2487 2 / 50
variance = S2 =  34.06367
49

standard deviation = S = 34.06367 =5.836.


38

2.6 Coefficient of variation


The standard deviations of 2 data sets that are expressed in different units cannot be
directly compared. Such a comparison can be done by calculating the

Example:

For the temperature data, x  49.74 and S = 5.836.


For the expenditure data (see example 3 of the frequency distributions) x  7.93333 and
S = 1.65567.

Since the two standard deviations that were calculated above are in different units, they
cannot be compared directly.

The coefficient of variation calculations show that in relative terms the variability for
expenditure data set is greater than that of the temperature data set.

2.7 Bell-shaped data


If it is known that the data set of interest has a bell-shaped clustering pattern of the values,
the Empirical rule says that
(i) Approximately 68% of data values are within 1 standard deviation of the mean.
(ii) Approximately 95% of data values are within 2 standard deviations of the mean.
(iii) Approximately 99.7% of data values are within 3 standard deviations of the mean.

Example:
Men’s Heights have a bell-shaped distribution with a mean of 69.2 inches and a standard
deviation of 2.9 inches.

Approximately 68% of data values are within 69.2 ± 2.9 = (66.3, 72.1).
Approximately 95% of data values are within 69.2 ± 5.8 = (63.4, 75).
Approximately 99.7% of data values are within 69.2 ± 8.7 = (60.5, 77.9).
39

2.8 Measures of position – percentiles

2.8.1 Definitions
The ith percentile , Pi , is the value that has i% of the values in a data set less or equal to it
(0 < i ≤ 100).

Examples

 Median = me = 50th percentile = P50.

 First quartile = Q1 = 25th percentile = P25.

 Third quartile = Q3 = 75th percentile = P75.

 The 9 deciles D1, D2, . . . , D9 are the values that have 10%, 20%, . . . , 90%
respectively of the values in the data set less or equal to them.

D1 = P10, D2 = P20, … , D5 = P50 = me, … ,D9 = P90.

2.8.2 Calculation of quartiles and quartile


deviation for raw data
For raw data the calculations of the first and third quartiles are based on the same principles
as that of the median.

Steps to be followed in calculating the first and third quartiles for raw data

1) Organize the values in the data set in ascending order in magnitude.

2) Find the median.

3) Divide the data set into 2 portions of equal numbers of values – set 1 consists of
those values less or equal to the median and set 2 consists of those values greater or
equal to the median. When the data set has an odd number of values, the median is
excluded from the division of the data set into 2 portions.

4) The first quartile (Q1) is the median of set 1 and the third quartile (Q3) is the median
of set 2.
40

Example

The distance from home to work (kilometers) of 11 employees at a certain company are
shown below. Calculate Q1 and Q3.

6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36

1) Ordered data set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49

2) Median = 40. After this step the median is deleted from the data set.

3) Set 1 – 5 values less than median i.e. 6, 7, 15, 36, 39.

4) Set 2 – 5 values greater than the median i.e. 41, 42, 43, 47, 49.

5) Q1 = median of set 1 = 15,


Q3 = median of set 2 = 43.

Example

Suppose the data set consists of the above values and 56 (12 values).

6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36, 56

1) Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49, 56

40  41
2) median =  40.5. Unlike what was done in example 1, no values are deleted
2
from the data set.

3) Set 1 – 6 values less or equal than median i.e. 6, 7, 15, 36, 39, 40

Set 2 – 6 values greater or equal than the median i.e. 41, 42, 43, 47, 49, 56.
15  36 43  47
4) Q1 = median of set 1 =  25.5 , Q3 = median of set 2 =  45.
2 2

Q3  Q1
The quartile deviation = Q = can also be used as a measure of variability.
2
For the data set in example 1, quartile deviation = Q = (43 – 15)/2 = 14.

The quartile deviation value shows the extent to which the values in the data set deviate
from the median. For a skew data set (heavy clustering at lower or upper end of the scale)
the quartile deviation is a more appropriate measure of variability than the standard
deviation (which is more suitable as a measure of variability for symmetric data sets).
41

2.8.3 Calculation of median, quartiles and


percentiles for grouped data

Percentile class – class that contains the percentile that is calculated.

A formula for calculating the ith percentile Pi for grouped data is shown below.

i = 1, 2, … , 100.

Li = lower class boundary of percentile class.

fi = frequency of percentile class

n = sample size

Fless = Sum of frequencies of classes less than percentile class.

c = class width.

Example

For the frequency distribution of temperatures (example 2 of the frequency distributions –


table given below), the calculations of the median, first quartile, third quartile, 4th decile and
65th percentile are shown below.

class cumulative
boundaries f frequency
37.5 – 41.5 4 4
41.5 – 45.5 10 14
45.5 – 49.5 8 22
49.5 – 53.5 15 37
53.5 – 57.5 9 46
57.5 – 61.5 3 49
61.5 – 65.5 1 50
Total 50
42

Median

The above formula with i = 50 , n = 50 applies.

i * n 50 * 50
Step 1: Calculate position of median =   25.
100 100
Step 2: Median class (class that contains 25th observation) is the class 49.5 – 53.5.

Step 3: L50 = 49.5, f50 = 15, Fless = 22, c = 4.

Step 4: Substitute into the above formula.


(25  22) * 4
Median = 49.5 +  50.3.
15

First quartile

The above formula with i = ___ , n =____ applies.

Step 1: Calculate position of first quartile :

Step 2: First quartile class (class that contains 12.5th observation) is the class __________

Step 3: L25 = ______ , f25 =______ , Fless = ______ , c = ______.

Step 4: Substitute into the above formula.

Q1 =

Third quartile

The above formula with i =_____ , n = _____ applies.

Step 1: Calculate position of third quartile :

Step 2: Third quartile class (class that contains 37.5th observation) is the class ___________

Step 3: L75 =______ , f75 =______ , Fles s=______ , c =______


43

Step 4: Substitute into the above formula.

Q3 =

Fourth decile

The above formula with i = 40 , n = 50 applies.


i * n 40 * 50
Step 1: Calculate position of 4th decile =   20.
100 100
Step 2: 4th decile class (class that contains 20th observation) is the class 45.5-49.5.

Step 3: L40 = 45.5, f40 = 8, Fless = 14, c = 4.

Step 4: Substitute into the above formula.


(20  14) * 4
D4 = 45.5 +  48.5.
8

65th Percentile

The above formula with i = 65, n = 50 applies.


i * n 65 * 50
Step 1: Calculate position of 65th percentile =   32.5.
100 100
Step 2: 65th percentile class (class that contains 32.5th observation) is the class 49.5 – 53.5.

Step 3: L65 = 49.5, f65 = 15 , Fless = 22, c = 4.

Step 4: Substitute into the above formula.

(32.5  22) * 4
P65 = 49.5 +  52.3.
15

Percentiles can also be read off from a “less than” ogive.

Example

The cumulative frequency graph on the following page shows the distribution of marks
scored by a class of 40 students in a test.
44

From the graph Q1 = 36, Me = 44, Q3 =52.

2.9 Five number summary and Box-and-


Whisker plot

Five number summary


 A five number summary of a data set is a summary using the minimum, 1st quartile,
median, 3rd quartile and maximum as summary measures.
 The five number summary shows the following types of information.

type value(s)
central tendency median
deviation Q  Q1
quartile deviation = Q = 3
2
extremes minimum and maximum

Example
The IQ’s of 13 people are shown below.

92, 104, 93, 98, 112, 145, 88, 90, 104, 119, 101, 95, 154
45

minimum = 88
Q1 = 92.5
median = 101
Q3 = 115.5
maximum = 154

Interqartile range = IQR = Q3 – Q1


This gives the range of the middle 50% of the data.
Example continued:
115.5 – 92.2 = 23
The middle 50% of the data values are spread over 23 units – they range from 92.5
to 115.5.

Quartile deviation = Q = IQR ÷ 2


This is also sometimes called the semi interquartile range. It shows the extent to
which the values in the data set deviate from the median.
Example continued:
Q = 23 ÷2 = 11.5

Box-and-Whisker plot

A box-and-whisker plot is a graphical representation of some important values of a data set.


 The median – a measure of central location – is given in the box-and-whisker plot.
 Q1 and Q3 are shown so the interquartile range and the quartile deviation – both
measures of variation – can be found.
 The minimum and maximum values are also shown so the range – a measure of
variation – can be calculated.
 Outliers are shown. Outliers are values that are unusually small or unusually large
relative to the other values in the data set.

To find outliers “cut-off” values must be found.

Lower cut-off value = Q* = Q1 – 1.5×IQR


Any value in the data set that is smaller than the lower cut-off value is considered
too small and so is an outlier.

Upper cut-off value = Q** = Q3 + 1.5×IQR


Any value in the data set that is larger than the upper cut-off value is considered too
large and so is an outlier.

Example continued:
IQR = Q3 – Q1 = 23
Q* = Q1 – 1.5×IQR
= 92.5 – (1.5)(23) = 58
46

None of the values in the data set are smaller than the lower cut-off value so there are no
values that are “too small”.

Q** = Q3 + 1.5×IQR
=115.5 + (1.5)(23)
=150

The only value in the data set that is larger than this is 154. This value (154) is “too big” and
so is an outlier

Drawing a box-and-whisker plot


 Mark Q1 and Q3 on a suitable section of the number line. Draw a box extending from
Q1 to Q3.
 Mark the position of the median on the number line. Draw a vertical line in the box
at this position.
 Mark the positions of the outliers (if any) with a star or some other bold mark.
 Draw the “whiskers” by drawing a horizontal line connecting Q1 to the minimum
value and Q3 to the maximum value. Special case: When there are outliers, the left
whisker extends to the smallest value that is not an outlier and the right whisker
extends to the largest value that is not an outlier.

Example continued:
47

A Box-and-Whisker plot can also be used to assess the skewness (departure from symmetry)
of a variable.
 For positively skewed data most of the values are at the lower end of the scale
(mean > median, “box” section of the plot towards the lower end of the scale).
 For negatively skewed data most of the values are at the upper end of the scale
(mean < median, “Box” section of the plot towards the upper end of the scale).
 In the previous example the data set is positively skew.

When several data sets are to be compared, several Box-and-Whisker plots can be plotted
side-by-side.

Example

The Box-and-Whisker plot shown below enables one to compare delays in departing flights
(in minutes) for certain days in December (16th to the 26th).

For all the days the data sets are positively skewed (data sets all have the “box” section
closer to the lower end of the scale with a long upper whisker). This means that there are
short delays in flight departures on all the days. The long upper whiskers that are visible
show that there were some quite late departures on 16, 17, 21, 22, 23, 24 and 25
December.
48

Chapter 3 – Probability

3.1 Terminology

Probability (Chance)
 A probability is the chance that something of interest will happen.
 A probability is expressed as a proportion i.e. it ranges from 0 to 1.
Chance can be expressed as a percentage i.e. it ranges from 0 to 100.

Examples

1) The probability of rain tomorrow is 0.40


There is a 40% chance of rain tomorrow.

1
2) The probability of winning the Lotto is .
13983816

3) The probability of a certain new product being successful is 0.75.

Random experiment
This is an experiment that gives different outcomes when repeated under similar conditions.

1) The experiment can have more than one possible outcome.

2) All possible outcomes can be listed.

3) The outcome that will occur when the experiment is performed depends on
chance.

Examples

1) Tossing a coin (possible outcomes: heads, tails).

2) Rolling a die (possible outcomes: 1, 2, 3, 4, 5, 6).

3) Asking a person to assign a rating to a product (possible outcomes: A, B, C, D, E).

4) Drawing a card from a deck of cards (possible outcomes: 13 hearts, 13 clubs, 13 spades,
13 diamonds).
49

Set
A set is a collection of outcomes.

Sample space
The sample space is the set of all possible outcomes of a random experiment. A
sample space is usually denoted by the symbol S and the collection of elements
contained in S enclosed in curly brackets { }.

Sample point
A sample point is an individual outcome (element) in a sample space.

Examples

1) Tossing a single coin. S = {h, t}.

2) Tossing a die. S = {1, 2, 3, 4, 5, 6}.

3) Tossing a pair of dice


S= { (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) }.

4) Tossing two coins. S = {hh, ht, th, tt}.

5) Drawing a card from a deck of cards. The elements in the sample space are listed
below.

S = {2♦ 3♦ 4♦ 5♦ 6♦ 7♦ 8♦ 9♦ 10♦ J♦ Q♦ K♦ A♦
2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥ A♥
2♣ 3♣ 4♣ 5 ♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣ A♣
2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠ }

Each outcome listed in the above examples is a sample point.

Event
An event is a subset of a sample space i.e. a collection of sample points taken from a sample
space.

Impossible event
An impossible event is an event that cannot happen (has probability zero).
50

Certain event
A certain event is an event that is sure to happen (has probability 1).

Simple events are events that involve only one sample point (outcome) of the sample space
.
Examples

1) Let E denote the event “an odd number is obtained when tossing a single die”.
Then E = {1, 3, 5}.

2) Let H denote the event “at least one head appears when tossing two coins”.
H = {hh, ht, th}.

3) Let B denote the event “obtaining a club and a heart in a single draw from a deck of
cards”. The event B is impossible. The set of outcomes of B is an empty set denoted by
B = { } = .

4) Let A denote the event “obtaining a 1, 2, 3, 4, 5 or 6 when tossing a single die”. The
event A is a certain event i.e. one of the outcomes belonging to the set describing the
event must happen. This is denoted by A = S, where S is the sample space.

Venn diagrams
 A Venn diagram is a drawing, in which circular areas represent groups of items
usually sharing common properties.
 The drawing consists of two or more circles, each representing a specific group or
set, contained within a square that represents the sample space. Venn diagrams are
often used as a visual display when referring to sample spaces, events and
operations involving events.

3.2 Complements, Unions and Intersections of


events
Compound events
These are events that involve more than one event. Such events can be obtained by
performing various operations involving two or more events.
Some of the operations that can be performed are described in the sections that follow.

Complementary events
The complementary event Ā (sometimes written À) of an event A is all the outcomes in S
that are not in A.
51

Examples

1) Consider the experiment of tossing a single die. S = {1, 2, 3, 4, 5, 6}. The complement
of the event A = “obtaining a 3 or less” = {1, 2, 3} is
A = “obtaining a 4 or more” = {4, 5, 6}.

2) Consider the experiment of tossing two coins. S = {hh, ht, th, tt}. The complement of
the event H = “at least one head”= {hh, ht, th} is H  “no heads” = {tt}.

Union and intersection of events

 The union of two events A and B, denoted by A  B , is the set of outcomes that are
in A or in B or in both A and B i.e. the event that
“either A or B or both A and B occur”
or “at least one of A or B occurs”.

 The intersection of two events A and B, denoted by A  B , is the set of outcomes


that are in both A and B i.e. the event that
“both A and B occur”.

The Venn diagrams below show the sets A  B and A  B .

A  B is the event “a sample point is in B but not in A”.


A  B is the event “a sample point is in A but not in B”.
52

These definitions involving two events can be extended to ones involving 3 or more events
e.g. for the 3 events A1, A2 and A3 the event A1  A2  A3 is the event “at least one of A1, A2
or A3 occurs” and A1  A2  A3 the event “A1 and A2 and A3 occur”.

Examples

1) Consider the events A = {1, 3, 6, 7, 8} and B = { 2, 3, 5, 7, 9} defined on a sample space


S = {1, 2, 3, . . . , 10}.

A  B = {1, 2, 3, 5, 6, 7, 8, 9} , A  B = { 3, 7},
A  B = {2, 5, 9}, A  B = {1, 6, 8}.

2) Let C be the event “drawing a face card from a deck of cards” and A the event “drawing
a king or an ace from a deck of cards”.

C = {J♦, Q♦, K♦, J♥, Q♥, K♥,


J♠, Q♠, K♠, J♣, Q♣, K♣}

A = {A♦, A♥, A♠, A♣, K♦, K♥, K♠, K♣}.

C  A = {J♦, Q♦, K♦, J♥, Q♥, K♥,


J♠, Q♠, K♠, J♣, Q♣, K♣,
A♦, A♥, A♠, A♣}.

C  A = { K♦, K♥, K♠, K♣}.

Mutually exclusive (disjoint) events


Two events A and B are mutually exclusive (disjoint) if they have no elements
(outcomes) in common .This also means that these events cannot occur together.
53

Examples

1) Let B be the event “drawing a black card from a deck of cards” and R the event “drawing
a red card from a deck of cards”.

The events B and R have no outcomes in common i.e. B  R   (empty set). Hence B
and R are mutually exclusive.

2) Let E be the event “an even number with a single throw of a die” and O the event “an
odd number with a single throw of a die” i.e. E = {2, 4, 6} and O = {1, 3, 5}.

E and O have no outcomes in common i.e. E  O   and are therefore mutually


exclusive.

3.3 Definitions of probability

Classical definition of probability


If there are n equally likely total numbers of outcomes of which m are favorable to
an event A, then the probability of occurrence of the event A, denoted as P(A), is
given by

N ( A) m
P(A) = = ,
N (S ) n

where N(A) = m is the number of outcomes favourable to the event A and N(S) = n
the number of outcomes in the sample space S i.e. the total number of outcomes.

Note: Since N(A) ≥ 0 and N(A) ≤ N(S), 0 ≤ P(A) ≤ 1.

Examples

1) Two coins are tossed. Find the probability of getting


(i) exactly two heads.
(ii) at least one head.

Solution:

Here, S = {hh, ht, th, tt} .


(i) Let A = getting exactly two heads = {hh}
∴ P(A) = ¼.
54

(ii) Let B = getting at least one head = {hh, ht, th}


∴ P(B) = ¾.

2) Two dice are rolled. Find the probability that a sum of 7 will occur.

Solution:
The number of sample points in S is 36 (see example 3 under sample space).

Let A = “a sum of 7 will occur”.

= {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}

∴ P(A) = 6/36 = 1/6.

The classical definition of probability requires the assumption that all the outcomes in the
sample space are equally likely. If this assumption is not met, this formula cannot be used.

Example
The possible temperatures (degrees Celsius) in Durban on a particular day in December are

15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.

In December Durban is hot so, for example, 15 degrees is less likely than 30 degrees
i.e. P (temperature = 15) = 1 ÷ 25 = 0.04 does not seem reasonable.

Relative frequency (empirical) definition of probability


If an experiment is repeated n times and an event A is observed f times, then the
estimated probability of occurrence of an event A is given by

f
P(A) = .
n

Note: This formula differs from the classical formula in the sense that the classical
formula uses all the outcomes in the sample space as the total number of outcomes,
while the relative frequency formula uses the number of repetitions (n) of the
experiment as total number of outcomes. In the classical formula the number of
outcomes in the sample space is fixed, while the number of repetitions of an
experiment (n) can vary. It can be shown that the empirical probability is a good
approximation of the true probability when n is sufficiently large.
55

Examples

1) A bent coin is tossed 1000 times with heads coming up 692 times.
692
An estimate of P(h) is  0.692.
1000

2) A summary of the final marks in a certain statistics course is shown below.

Mark f
less than 30 6
30 – 39 26
40 – 49 45
50 – 59 64
60 – 69 82
70 – 79 37
80 – 89 22
90 – 99 8
Total 290

From the table (using the empirical formula) the following probabilities can be
estimated.

26  6
(a) P(mark less than 40) =  0.110.
290

64  82  37  22  8 6  26  45 213
(b) P(pass) =  1   0.73.
290 290 290

22  8
(c) P(above 80) =  0.103.
290

Marginal and joint probabilities


 Probabilities involving the occurrence of single events are called marginal
probabilities.
 Probabilities involving the occurrence of two or more events are called joint
probabilities

Example

The preference probabilities according to gender for 2 different brands of a certain product
are summarized in the table on the following page.
The gender marginal probabilities are obtained by summing the joint probabilities over the
brands. The brand marginal probabilities are obtained by summing the joint probabilities
over the genders.
56

Brand
Marginal
1 2
Probability
Male 0.2 0.32 0.52
Gender
Female 0.4 0.08 0.48
Marginal
0.6 0.4 1
Probability

Joint probabilities: P(male  brand 1) = 0.20, P(male  brand 2) = 0.32,


P(female  brand 1) = 0.40, P(female  brand 2) = 0.08

Marginal probabilities: P(male) = 0.52, P(female) = 0.48,


P(brand 1) = 0.60, P(brand 2) = 0.40.

3.4 Counting formulae


The computation of probabilities using the classical definition involves counting the number
of outcomes favourable to the event of interest (say event A) and the total number of
possible outcomes in the sample space. The following formulae can be used to count
numbers of outcomes to be used in the classical definition formula.

Addition and Multiplication formulae for counting


Addition formula – If an experiment can be performed in n ways, and another experiment
can be performed in m ways then either of the two experiments can be performed in (n+m)
ways.

This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then one of the k
experiments can be done in n1 + n2 +. . . + nk ways.

Example:
Suppose a man is standing in a room which has 2 doors to his left and 1 door to his
right. In how many ways can he leave the room?

Solution:
Let “leave the room by going to the left” be experiment 1 and “leave the room by
going to the right” be experiment 2. There are n=2 ways to do experiment 1 (he can
leave by door A or door B) and there is m=1 way to do experiment 2 (he can leave by
door C). In total there are n+m = 2+1 = 3 ways to leave the room.
57

Multiplication formula – If an experiment can be done in n ways and another experiment


can be done in m ways, then both of the experiments can be done in n × m ways.

This rule can be extended to any finite number of experiments. If one experiment can be
done in n1 ways, a second one in n2 ways, . . . , a kth one in nk ways, then the k
experiments together can be done in n1×n2×…×nk ways.

Example 1:
A basic meal consists of soup, a sandwich and a beverage. If a person having this
meal has 3 choices of soup, 4 choices of sandwiches and a choice of coffee or tea as
a beverage, how many such meals are possible?

Choosing soup (experiment 1) has 3 possibilities.


Choosing a sandwich (experiment 2) has 4 possibilities.
Choosing a beverage (experiment 3) has 2 possibilities.

Number of choices of meals = 3 × 4 × 2 = 24.

Example 2:
A PIN to be used at an ATM can be formed by selecting 4 digits from the digits
0, 1, 2, . . . , 9 . How many choices of PIN are there if

(a) digits may be repeated?


(b) digits may not be repeated?

(a) First digit – 10 choices,


second digit – 10 choices,
third digit – 10 choices,
fourth digit – 10 choices.
number of choices = 10 × 10 × 10 × 10 = 104 = 10 000.
58

(b) First digit – 10 choices,


second digit – 9 choices,
third digit – 8 choices,
fourth digit – 7 choices.
number of choices = 10 × 9 × 8 × 7 = 5040.

Factorial notation
In how many ways can n (n – integer) objects be arranged in a row?

Let n = 2 : 1st object – 2 choices


2nd object – 1 choice.
Number of ways = 2 × 1 = 2.

Let n = 3 : 1st object – 3 choices


2nd object – 2 choices.
3rd object – 1 choice.
Number of ways = 3 × 2 × 1 = 6.

In general: the number of ways is n × (n-1) × (n-2) ×. . . × 2 × 1 = n ! (n factorial)

Using this notation 2 × 1 = 2 ! = 2


3×2×1=3!=6
4 × 3 × 2 × 1 = 4 ! = 24 etc.

Note: 1 ! = 1, 0 ! = 1.

The factorial notation is used in counting formulae.

Examples
1) In how many ways can 7 people be placed in a queue at a bus stop?

The 7 people have to be placed in the 7 positions from 1st to 7th.


No. of ways = 7 × 6 × 5 × . . . × 2 × 1 = 7 ! = 5040.

2) In how many ways can 5 books be arranged in a row?

No. of ways = 5 × 4 × 3 × 2 × 1 = 5 ! = 120.


59

Permutations and combinations

Permutation
 A permutation is the number of different arrangements of a group of items where
order matters.
 The number of permutations of n objects taken r at a time is calculated from
n!
nPr = P(n, r) = .
(n  r )!

Combination
 A combination is the number of different selections of a group of items where order
does not matter.
 The number of combinations of a group of n objects taken r at a time is calculated
from
n n!
nCr = C(n, r) = ( r ) = .
(n  r )!r!

Examples:

1) Four people (A, B, C, D) serve on a board of directors. A chairman and vice-chairman are
to be chosen from these 4 people. In how many ways can this be done?

Chairman Vice-chairman
A B
B A
A C
C A
A D
D A
B C
C B
B D
D B
C D
D C

Number of ways = 12.

2) Four people (A, B, C, D) serve on a board of directors. Two people are to be chosen from
them as members of a committee that will investigate fraud allegations. In how many
ways can this be done?

People chosen A and B A and C A and D B and C B and D C and D

Number of ways = 6.
60

In both these examples a choice of 2 people from 4 people is made. However, in example 1
the order of choice of the 2 people matters (since the one person chosen is chairman and
the other one vice-chairman). In example 2 the order does not matter. The only interest is in
who serves on the committee.

Application of formulae.
In question 1 the permutations formula applies with n = 4, r =2.

4!
Number of ways = P(4, 2) =  12.
(4  2)!

In question 2 the combinations formula applies with n = 4, r =2.

4!
Number of ways = C(4, 2) =  6.
2!(4  2)!

3) Find the number of ways to take 4 people and place them in groups of 3 at a time where
order does not matter.

Solution:
Since order does not matter, use the combination formula.

4! 24
C(4,3) =  4 .
3!(4  3)! 6

4) Find the number of way to arrange 6 items in groups of 4 at a time where order matters.

6! 720
Solution: P(6,4) =   360
(6  4)! 2!

There are 360 ways to arrange 6 items taken 4 at a time when order matters.

5) Find the number of ways to take 20 objects and arrange them in groups of 5 at a time
where order does not matter.

20! 20.19.18.17.16
Solution: C(20,5) =   15504
5!(20  5)! 1.2.3.4.5
There are 15 504 ways to arrange 20 objects taken 5 at a time when order does not
matter.
61

6) Determine the total number of five-card hands that can be drawn from a deck of 52
cards.

Solution:
When a hand of cards is dealt, the order of the cards does not matter. Thus the
combinations formula is used.

There are 52 cards in a deck and we want to know in how many different ways we can
draw them in groups of five at a time when order does not matter. Using the
combination formula gives
C(52,5) = 2 598 960.

7) There are five women and six men in a group. From this group a committee of 4 is to be
chosen. In how many ways can the committee be formed if the committee is to have at least
3 women in it?

Solution:

8) In how many ways can a phone number consisting of 5 digits be chosen from the digits
1, 2, 3, . . . , 9 if no digits are to be repeated?

Solution:

9) In how many ways can the 6 winning numbers in a Lotto draw be selected?

Solution:

10) In many ways can a five-card hand consisting of three eight's and two sevens be dealt?

Solution:
62

11) How many different 5-card hands include 4 of a kind and one other card?

Solution:

We have 13 different ways to choose 4 of a kind: 2's, 3's, 4's, … Queens, Kings and
Aces.

Once a set of 4 of a kind has been removed from the deck, 48 cards are left.

Remember OR means add.

The possible situations that will satisfy the above requirement are:

4 Aces and one other card C(4,4)×C(48,1) = 48.

or 4 Kings and one other card C(4,4)×C(48,1) = 48.

or 4 Queens and one other card C(4,4)×C(48,1) = 48.


.
.
.
or 4 twos and one other card C(4,4)×C(48,1) = 48.
Total of 48×13 = 624 ways.

3.5 Basic probability formulae

Complementary events
For any event A defined on some sample space,

P( A ) = 1 – P( A).

Union of two or more events

For any two events A and B defined on some sample space,

P( A  B)  P(A) + P(B) for mutually exclusive events.

P( A  B)  P(A) + P(B) – P( A  B) for events that are not mutually exclusive.

These formulae can be extended to probabilities involving more than two events
e.g. for 3 events A, B and C defined on some sample space
63

P( A  B  C ) = P(A) + P(B) + P(C) for mutually exclusive events.

P( A  B  C ) = P(A) + P(B) + P(C) – P( A  B) – P(A  C ) – P( B  C )  P( A  B  C )

for events that are not mutually exclusive.

This formula can easily be verified with the aid of the Venn diagram shown below.

From the above diagram the following sets can be written down.

A = {1, 2, 4, 5 }; B = {2, 3, 5, 6} ; C = {4, 5 ,6, 7} ;


A  B = {2, 5} ; A C = {4, 5 }; B  C = {5, 6};
A  B  C = {5}; A  B  C = {1, 2, 3, 4, 5, 6, 7}.

Exercise: Complete the verification of the result for P( A  B  C ) .

De Morgan’s Laws
____
(1) P( A  B )  P( A  B)
_____
(2) P ( A  B )  P( A  B)

Venn diagram verification of second result


Right hand side:
64

Left hand side:

Exercise: Verify the first result by using Venn diagrams.

Total probability formulae

P(A) = P( A  B)  P( A  B )

P(B) = P( A  B)  P( A  B)

These formulae can be verified from the Venn diagram shown on the following page.

The formulae can be extended to probabilities involving more than two events.

Examples

1) There are two telephone lines – A and B. Line A is engaged 50% of the time and line B is
engaged 60% of the time. Both lines are engaged 30% of the time. Calculate the
probability that

(a) at least one of the lines are engaged.


(b) none of the lines are engaged.
(c) line B is not engaged.
(d) line A is engaged, but line B is not engaged.
(e) only one line is engaged.
65

Solution:

Let E1 denote the event “line A is engaged” and E2 the event “line B is engaged”.

Given: P(E1) = 0.5, P(E2) = 0.6, P(E1  E2) = 0.3.

(a) P(at least one of the lines are engaged) = P(E1  E2)
= P(E1) + P(E2) – P(E1  E2)
= 0.5 + 0.6 – 0.3
= 0.8

(b) P(none of the lines are engaged.) = 1 – P(at least one of the lines are engaged)
= 1 – 0.8
= 0.2

(c) P(B not engaged) = 1 – P(B engaged) = 1 – P(E2) = 1 – 0.6 = 0.4.

(d) The event “line A is engaged, but line B is not engaged” can be written in symbols as

P(E1  E 2 ) = P(E1) – P(E1  E2)


= 0.5 – 0.3
= 0.2.
(Used the total probability formula)

(e) P(only one line is engaged) = P(line A is engaged, but line B is not engaged)
+ P(line B is engaged, but line A is not engaged)
= P( E 1E 2 )  P( E1  E 2 )

P( E1  E 2 ) = P(E2) – P(E1  E2) = 0.6 – 0.3 = 0.3. (Using the total probability
formula)

P(only one line is engaged) = 0.2 + 0.3 = 0.5

2) Let O be the event that a certain lecturer will be in his/her office on a particular
afternoon and L the event that he/she will be at a lecture. Suppose P(O) = 0.48 and P(L)
= 0.27.

(a) State in words the event O  L .


(b) Calculate P( O  L ).
66

Solution:

(a) O is the event that

L is the event that

O  L is the event that

(b) P( O  L ) =

3) A batch of 20 computers contain 3 that are faulty. Four (4) computers are selected at
random without replacement from this batch. Calculate the probability that

(a) all 4 the computers selected are not faulty.


(b) at least 2 of the computers selected are faulty.

Solution:

There are C(20,4) = 4845 [why not P(20,4) ?] ways of selecting the 4 computers from the
batch of 20. Since random selection is used, all 4845 selections are equally likely. Let A
denote the event “all 4 the computers selected are not faulty” and B the event “at least
2 of the computers selected are faulty”

Using the classical probability result,

(a) P(A) =

(b) P(B) =
67

3.6 Conditional probability

The Conditional probability of an event A occurring given that another event B has occurred
is given by

P( A  B)
P(A | B) = , where P(B) > 0.
P( B)

P( A  B)
Also P(B|A) = , where P(A) > 0.
P( A)

Example 1
Five hundred (500) TV viewers consisting of 300 males and 200 females were asked whether
they were satisfied with the news coverage on a certain TV channel. Their replies are
summarized in the table below.

Answer
Satisfied Not Satisfied Total
Male 180 120 300
Gender
Female 90 110 200
Total 270 230 500

180
P(satisfied | male) = = 0.6.
300

90
P(satisfied | female) = = 0.45.
200

P(not satisfied | male) =

P(not satisfied | female) =

270
P(satisfied) =  0.54 and P(not satisfied) =
500

Note

1) When calculating a conditional probability the sample space is restricted to that


associated with the event that is known to occur.
68

2) The probability of a person being satisfied depends on the gender of the person being
interviewed. In this case females are less satisfied than males with the news coverage.

Example 2
At a certain university the probability of passing accounting is 0.68, the probability of
passing statistics 0.65 and the probability of passing both statistics and accounting is 0.57.
Calculate the probability that a student

(a) passes statistics when it is known that he/she passed accounting.

(b) passes accounting when it is known that he/she passed statistics.

(c) passes statistics when it is known that he/she did not pass accounting.

Solution:

Let A denote the event “a student passes accounting” and


B the event “a student passes statistics”.

Then A is the event “a student did not pass accounting”,


A  B the event “a student passes both statistics and accounting” and
A  B the event “a student passes statistics, but not accounting”.

Given: P(A) = 0.68, P(B) = 0.65, P( A  B) = 0.57.

P( A  B) 0.57
(a) P(B|A) = =  0.838 .
P( A) 0.68

P( A  B) 0.57
(b) P(A|B) = =  0.877.
P( B) 0.65

(c) P(B | A ) =

Multiplication rule of probabilities

Suppose the joint probability P( A  B) is to be calculate if either of the conditional


probabilities [ P(A|B) or P(B|A) ] and the corresponding unconditional probability [ P(B) or
P(A) ] are known. Then the conditional probability formulae can be manipulated to obtain
the joint probability i.e.
69

P(A  B) = P(B) P(A|B) = P(A) P(B|A).

These formulae are known as the multiplication formulae of probabilities.

Examples

1) A box has 12 bulbs, 3 of which are defective. If two bulbs are selected at random
without replacement, then what is the probability that both are defective?

Solution:

Let d1 denote the event “the first bulb is defective” and d2 the event “the second bulb is
defective”.
T
3
Then P(d1) = and
12
2
P(d2|d1) = .
11
Using the above mentioned multiplication formula,

3 2
P(d2  d1) = P(d1) P(d2|d1) =  0.045.
12 11

2) Two cards are drawn at random from from a deck of playing cards. What is the
probability that both these cards are aces?

Solution:

Since there are 4 aces in a deck of 52 cards, the probability of drawing one ace is 4/52.
Having removed one ace and not replacing it reduces the probabilities of drawing
another ace on the second draw. The 51 cards remaining contain 3 aces and therefore
the probability of drawing an ace on the second draw is 3/51. We can multiply these
probabilities and determine the probability of drawing two aces.

P(drawing 2 aces) = (4/52) × (3/51) = 1/221.

 The multiplication rule can be extended to involve more than 2 events


e.g. for 3 events A1, A2 and A3 defined on the same sample space,

P( A1  A2  A3 ) = P(A1) P(A2|A1) P(A3|A2∩A1).


70

3) Three cards are drawn at random from from a deck of playing cards. What is the
probability that all 3 these cards are aces?

Solution: P(drawing 3 aces) = (4/52) . (3/51) . (2/50) = 1/5525.

Independent events

Two events A and B are said to independent if P(A| B) = P(A) or P(B|A) = P(B).
This means that the occurrence of B does not affect the probability that A occurs.

Substitution of the above result into the multiplication formula for two probabilities gives
P(A  B) = P(A) P(B) if A and B are independent.

This formula is known as the product formula for independent events.

Examples

1) The probability that person A will be alive in 20 years is 0.7 and the probability that
person B will be alive in 20 years is 0.5, while the probability that they will both be alive
in 20 years is 0.45. Are the events E1 “A is alive in 20 years” and E2 “B is alive in 20 years”
independent?

Solution:

P(E1) = 0.7, P(E2) = 0.5, P(E1  E2) = 0.45

Since P(E1) P(E2) = 0.7 × 0.5 = 0.35 ≠ P(E1  E2), the events E1 and E2 are not
independent.

2) Two coins are tossed. Using the classical definition of probability,


P(both tosses heads) = ¼ .
Assuming that both coins are unbiased, P(1st coin is heads) = P(2nd coin is heads) = ½ .

Since P(1st coin is heads) × P(2nd coin is heads) = ½ × ½ = ¼ = P(both tosses heads),
the events “heads on the first toss” and “heads on the second toss” are independent.

 The multiplication rule for independent events can be extended to involve more
than 2 events. In general, if the events A1, A2, . . . , An are independent then

P( A1  A2  . . .  An ) = P(A1) P(A2) . . . P(An).


71

Examples

1) A coin is tossed and a single 6 sided die is rolled. Find the probability of “heads” and
rolling a 3 with the die.
P(head) = ½ and P(3) = 1/6.

Since the results of the coin and the die are independent,
P(heads and 3) = P(heads) P(3) = (1/2) × (1/6) = 1/12

2) A school survey found that 9 out of 10 students like pizza. If three students are
chosen at random with replacement, what is the probability that all three students
like pizza?

Solution
P(student 1 likes pizza) = 9/10 = P(student 2 likes pizza) = P(student 3 likes pizza).

P(student 1 likes pizza and student 2 likes pizza and student 3 likes pizza)
= P(student 1 likes pizza) x P(student 2 likes pizza) x P(student 3 likes pizza)
9
= ( ) 3  0.729 .
10

3) It is known that 8% of all cars of a certain make that are sold encounter engine
overheating problems within 50 000 kilometers of travel. During the past week 4
such cars were sold. Assuming that engine overheating problems for the 4 cars are
encountered independently, what is the probability that
(a) all 4
(b) none
(c) at least one of these cars sold
encounter engine overheating problems within 50 000 kilometers of travel ?

Solution:
Let A denote the event “overheating problems within 50 000 kilometers of travel”.

(a) P(A) = 0.08.

P(all 4 have overheating problems) = [P(A)]4 = 0.084 = 0.00004096.

(b) P(not overheating problems) =

So
P(none) =
72

(c) P(at least 1) =

Bayes’ theorem

P( A  B)
In order to apply the conditional probability formula P(A|B) = ,
P( B)
values for P(A  B) and P(B) are needed.

Suppose that only the values for P(A), P(B|A) and P(B| A ) are available.
In this case the probabilities [ P(A  B) and P(B)] required for calculating P(A|B) can be
calculated from

P(A  B) = P(A) P(B|A)


(Using conditional probability multiplication formula)

and

P(B) = P( A  B)  P( A  B) = P(A) P(B|A) + P( A ) P(B| A ) .


(Using the total probability formula and the conditional probability multiplication formula)

Substituting these probabilities into the first conditional probability formula gives

P( A) P( B | A)
P(A|B) = .
P( A) P( B | A)  P( A ) P( B | A )

This result is known as Bayes’ theorem (named after the person who proposed the
method).

Example 1

When testing a person for a certain disease, the test can show either a positive result (the
person has the disease) or a negative result (the person does not have the disease).

When a person actually has the disease, the test shows positive 99% of the time. When the
person actually does not have the disease the test shows negative 95% of the time. Suppose
it is known that only 0.1% of the people in the population have the disease.

a) If a test turns out to be positive, what is the probability that the person has the
disease?
73

b) If the test turns out to be negative, what is the probability that the person does not
have the disease?

Solution:

Let A = the person has the disease


B = the test returns a positive result
Then

A is the event “the person does not have the disease”,


B|A is the event “the test is positive given the person has the disease”,
B| A is the event “the test is positive given the person does not have the disease” and
B | A is the event “the test is negative given the person does not have the disease”.

(a) P(A) = 0.001 (given) ,


P( A ) = 1 – P(A) = 0.999,
P(B|A) = 0.99 (given),
P( B | A ) = 0.95 (given),
P(B| A ) = 1 – P( B | A ) = 0.05.

Substitution into the above formulae gives

Numerator: P(A  B) = P(A) P(B|A) = 0.001 × 0.99 = 0.00099

Denominator:
P(B) = P( A  B)  P( A  B)
= P(A) P(B|A) + P( A ) P(B| A )
= ( 0.001 × 0.99 ) + ( 0.999 × 0.05 )
= 0.00099 + 0.04995
= 0.05094

P( A  B) 0.00099
P(A|B) = = = 0.0194.
P( B) 0.05094

P( A  B ) P( A ) P( B | A ) 0.999 x0.95
(b) P( A | B ) =    0.9999895.
P( B ) 1  P( B) 0.94906

From the above it can be seen that a negative result of the test is very reliable (it will be
wrong only 105 times in 10 million cases). On the other hand, the chances that a person will
have the disease when the result of the test shows positive is 194 in 10 000.
74

Bayes’ Theorem can be extended:

Suppose A1, A2, …, An are mutually exclusive events whose union is the sample space
S and P(Ai) > 0. Then, for any event B with P(B) > 0, and any k={1, 2, …, 3},

Example 2

Suppose that Bob can decide to go to work by one of three modes of transportation – car,
bus, or commuter train. Because of high traffic, if he decides to go by car, there is a 50%
chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes
overcrowded, the probability of being late is only 20%. The commuter train is more
expensive than the other modes of transport but is late only 1% of the time.

a) Suppose that Bob is late one day and his boss wishes to estimate the probability that
he drove to work that day by car. Since he does not know which mode of
transportation Bob usually uses, he assumes that each mode is equally likely to be
used. What is the boss’ estimate of the probability that Bob drove to work by car?

b) Suppose that a co-worker of Bob’s knows that Bob drives to work by car 10% of the
time, he almost always takes the commuter train to work, and he never takes the bus.
Given that Bob is late to work today, the co-worker believes there is a ____% chance
that Bob came to work by train.

Solution

There are two events of interest –being late and choice of transport. There are 3 options for
the choice of transport.
Let
L = is late to work
B = takes bus
C = takes car
T = takes train

Each mode of transport is equally likely so

Solution (a)
75

Denominator:

Solution (b)
Try for yourself

3.7 Probabilities and odds


Let a to b be the odds in favour of some event A. Suppose P(A) = p. Then P( A ) = 1 – p.
The odds in favour of A is then defined as

a p
 .
b 1 p

a b
From the above it can be shown that p = and 1 – p = .
ab ab

The odds against A is b to a. From the above

b 1 p
 .
a p

Examples

a) A pair of balanced dice is tossed. What are the odds in favour of the sum of the numbers
showing a 6?
Total number of outcomes = 6 x 6 =36.
Possible ways of getting a sum of 6 : (1, 5), (2, 4), (3, 3), (4, 2), (5,1).
Number of ways of getting a 6 is 5.
76

p = probability sum equals 6 = 5/36 , 1 – p = 31/36.


Odds in favour of a 6 is: 5 to 31 or 1 to 6.2

b) What are the odds in favour of a red number coming up?


p = probability (red number) = 18/37 and 1–p= 19/37
So the odds in favour of a red are 18 to 19, or 1 to 1.056.

c) The table below shows data that were collected from 781 middle aged female patients at a
certain hospital.

Smoker\heart problems yes no total

yes 172 173 345

no 90 346 436

total 262 519 781

From the table it can be seen that

(i) For smokers the odds in favour of heart problems is 172 to 173 or 1 to 1.0058

(ii) For non-smokers the odds in favour of heart problems is 90 to 346 or 1 to 3.8444.

From this it can be seen that smokers are much more at risk for heart problems than non-
smokers.
77

Chapter 4 – Probability
distributions of discrete random
variables

4.1 Discrete random variables


A random variable is a variable whose value depends on the outcome of a random
experiment. A random variable is denoted by a capital letter and a particular value of a
random variable by a lower case (small) letter.

Examples:

1) T = the number of tails (t) when a coin is flipped 3 times.


2) X = the sum of the values (x) showing when two dice are rolled.
3) H = the height (h) of a woman chosen at random from a group.
4) V = the liquid volume (v) of soda in a can marked 12 oz.

There are two types of random variables:

Discrete Random Variables


 Variables that have a finite or countable number of possible values.
 These variables usually occur in counting experiments.

Continuous Random Variables


 Variables that can take on any value in some interval i.e. they can take an infinite
number of possible values.
 These variables usually occur in experiments where measurements are taken.

Examples:

1) The variables T and X from the above examples are discrete random variables.

2) The variables H and V from the above examples are continuous random variables.
78

4.2 Discrete probability distributions and


their graphical representations
A discrete probability distribution is a list of the possible distinct values of the random
variable together with their corresponding probabilities.The probability of the random
variable X assuming a particular value x is denoted by P(X=x) = p(x). Sometimes the notation
P(X=x) = f(x) or P(X=x)=P(x) is used. This probability, which is a function of x, is referred to
as the probability mass function.

Examples:

1) As above, let T be the random variable that represents the number of tails obtained
when a coin is flipped three times. Then T has 4 possible values 0, 1, 2, and 3. The
outcomes of the experiment and the values of T are summarized in the next table.

Outcomes T
hhh 0
hht, hth, thh 1
tth, tht, htt 2
ttt 3

Assuming that the outcomes are all equally likely, the probability distribution for T is
given in the following table.

t 0 1 2 3 Total
p(t) 1/8 3/8 3/8 1/8 1

2) Let Y denote the number of tosses of a coin until heads appear first. Then

S = {h, th, tth, ttth, . . . } and Y =1, 2, 3, 4, …

y 1 2 3 . . . Total
p(y) ½ (½)2 (½)3 . . . 1

Why is ½ + (½)2 + (½)3 + . .. = 1 ?

3) A pair of dice is tossed. Let X denote the sum of the digits. The probability
distribution of X can be found from the following table. The entry in any particular
cell is the sum of the row and column values.
79

1st die
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
2nd die 3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

x 2 3 4 5 6 7 8 9 10 11 12
P(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

Note:
For any discrete random variable X, the range of values that it can assume are such that

0 ≤ P(x) ≤ 1 and  P( x)  1 .
x

The cumulative distribution function


The cumulative distribution function is defined as

Examples

1) For the probability mass function in example 1 the cumulative distribution function is

x 0 1 2 3
F(x) 1/8 ½ 7/8 1

2) For the probability mass function in example 3 the cumulative distribution function is

x 2 3 4 5 6 7 8 9 10 11 12
F(x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 1

3) Consider a discrete random variable with probability mass function given below.

x 1 2 3 4
P(X=x) 0.1 0.3 0.4 0.2
80

(a) CDF (b) PMF

The graphs on the previous page are plots of the probability mass function (graph on the
right) and cumulative distribution function (graph on the left).

A random variable can only take on one value at a time i.e. the events X = x1 and X = x 2 for
x1 ≠ x2 are mutually exclusive. The probability of the variable taking on any number of
different values can be found by simply adding the appropriate probabilities.

Examples

1) Find the probability of getting 2 or more tails when a coin is flipped 3 times.

P(T ≥ 2) = 3/8 + 1/8 = ½.

2) Find the probability of getting at least one tail when a coin is flipped 3 times.

P(at least 1) = p(1) + p(2) + p(3)


= 3/8 + 3/8 +1/8
= 7/8

Or

P(at least 1) = 1 – p(0) = 1 – 1/8 = 7/8.

3) Find the probability of needing at most 3 tosses of a coin to get the first heads.

P(at most 3) = p(1) + p(2) + p(3)


= ½ + (½)2 + (½)3
= 7/8
81

4) Find the probability of getting a sum of


(a) 7 when tossing a pair of dice.
(b) at least 4 when tossing a pair of dice.

(a) P(7) = P(1st is 6, 2nd is 1) + P(1st is 5, 2nd is 2) + P(1st is 4, 2nd is 3)


+ P(1st is 3, 2nd is 4) + P(1st is 2, 2nd is 5) + P(1st is 1, 2nd is 6)
= 6/36
= 1/6.

(b) P(at least 4) = p(4) + p(5) + . . . + p(12)


= 1 – [p(2) + p(3)]
= 1 – 3/36
= 33/36
=11/12.

4.3 Mean, variance and standard deviation


of a discrete random variable

The mean or expected value of a random variable X is the average value that we would
expect for X when performing the random experiment many times.

Notation: The mean or expected value of a random variable X will be represented by  or


E(X).

We can calculate the mean by using the formula

E(X) =  =  xp(x) .
Examples

1) The expected value of the random variable T from above is:

Thus if 3 coins are flipped a large number of times, we should expect the average
number of tails (per 3 flips) to be about 1.5. Since the number of tails is an integer
value, it will never actually assume the mean value of 1.5. This mean value more
82

reflects the fact that the extreme values (0 and 3) occur the same proportion of
times (an eighth) and the middle values occur the same proportion of times (three
eighths).

2) The score S obtained in a certain quiz is a random variable with probability distribution
given below.

s 0 1 2 3 4 5
p(s) 0.12 0.04 0.16 0.32 0.24 0.12

The mean of the random variable S can be calculated as shown below.

s 0 1 2 3 4 5 sum
p(s) 0.12 0.04 0.16 0.32 0.24 0.12 1
s × p(s) 0 0.04 0.32 0.96 0.96 0.60 2.88

 = E(S) = 2.88

Variance
For a random variable X, the variance, denoted by σ2 , can be calculated by using the
formula

The standard deviation of X, denoted by σ, is just the positive square root of σ2. This is a
measure of the extent to which the values are spread around the mean.

The calculation of the standard deviation for a random variable is similar to that of the
calculation of the standard deviation for grouped data.

Example

Calculate the standard deviation of the random variable T from above.

t 0 1 2 3 sum
p(t) 1/8 3/8 3/8 1/8 1
t × p(t) 0 3/8 6/8 3/8 1.5
t2 × p(t) 0 3/8 12/8 9/8 3
83

4.4 Binomial, hypergeometric and Poisson


distributions

Bernoulli trial:
Consider an experiment in which there are two complementary outcomes. One
outcome is labelled “success” (s) and the other is labelled “failure” (f). Such an
experiment is called a Bernoulli trial.
We denote the probability of success as P(s)= p and the probability of failure as
P(f) = 1–p = q

Binomial random variable:


Random variable X is said to have a Binomial distribution if it counts the number of
successes when n (fixed number) identical, independant Bernoulli trials are
performed.

So, to identify a Binomial random variable 5 questions must be asked.


1) Are there a fixed number of trials (n)?
2) Is each trial a Bernoulli trial i.e. does each trial have 2 complementary outcomes?
3) Are the Bernoulli trials identical i.e. is the probability of success p the same for all
the trials?
4) Are the trials independent i.e. does the outcome of one trial not affect the
outcome of another trial?
5) Does X count the number of successes?
If the answer is “yes” to all 5 questions then X is a Binomial random variable.

Notation:
A short hand way of referring to a binomially distributed random variable X, based
on n trials with probability of success p, is X ~ B(n,p) or X ~ Bin(n,p).

Examples:

1) Consider the experiment of flipping a coin 5 times. If we let the event of getting “tails”
on a flip be labeled “success” and “heads” failure, and if the random variable T
represents the number of tails obtained, then T will be binomially distributed with n = 5,
p = ½ and q=½

2) A student answers 10 questions in a multiple-choice test by guessing each answer. For


each question, there are 5 possible answers, only one of which is correct. If we consider
a “success” as getting a question right and consider the 10 questions as 10 independent
Bernoulli trials, then the random variable X representing the number of correct answers
will be binomially distributed with n=10, p=0.2 and q=0.8.
84

3) Fourteen percent of flights from a certain airport are delayed. If 20 flights are chosen at
random, then we can consider each flight to be an independent Bernoulli trial. If we
define a successful trial to be one where a flight takes off on time, then the random
variable Z representing the number of on-time flights will be binomially distributed with
n =2 0, p = 0.86 and q = 0.14.

Tree diagram
The number of possible outcomes in a binomial experiment can be written down
from a diagram such as the one below. This diagram called a tree diagram enables
one to write down all the outcomes when this experiment is performed 3 times.

1st 2nd 3rd

s
s
f
s
s
f
f
start

s
s
f
f
s
f
f

The following outcomes and their respective number of successes (x) can be written down
from the above tree diagram.

Outcomes x
fff 0
ffs, fsf, ffs 1
ssf, sfs, fss 2
sss 3

Formula for the calculation of binomial probabilities

A formula for the binomial probability mass function for the case n = 3 can be written down
from the above table by noting the following.
85

1) Each outcome is a sequence of s (success) and f (failure) values e.g. fff, ffs, ssf etc.

2) In a particular sequence s occurs x times and f occurs (3 – x) times for x = 0, 1, 2, 3.

3) Since the trials are independent, the probability of a particular sequence of s’s and
f’s is given by a product of p (the probability of success) and q (the probability of
failure) values, where p’s occur x times and q’s (3 – x) times e.g. P(fff) = q3,
P(ffs) = pq2, P(ssf) = p2q etc.

4) The number of outcomes where there are x success and (3 – x) failure outcomes can
be counted by using the formula C(3, x)= 3Cx .

By using the above, the binomial formula for n = 3 can be written down as

P(X=3) = 3Cx px q3-x for x = 0, 1, 2, 3.

To write down the general formula, the same reasoning as explained above applies to
sequences with n outcomes consisting of s (x of these) and f (n – x of these) values. In the
formula the number 3 is just replaced by n i.e.

P(X = x) = f(x) = nCx px qn-x for x = 0, 1, 2, …, n

Examples

1) As in the previous examples, let T be the random variable representing the number of
tails when a coin is flipped 3 times. Then T ~ Bin(3 , 0.5). Using the formula above with
n=3 and p = 0.5 , we can calculate the probability of exactly 2 tails as:

f(2) = 3C2 (0.5)2 (0.5) 1 = 0.375

2) A student answers 10 questions in a multiple-choice test by guessing each answer. For


each question, there are 5 possible answers, only one of which is correct. If we consider
a “success” as getting a question right and consider the 10 questions as 10 independent
Bernoulli trials, then X ~ Bin(10 , 0.2) where X is the random variable representing the
number of correct answers. What is the probability that the student chooses

a) 3 answers correctly?
b) 7 answers correctly?
c) fewer than 3 answers correctly?
d) at least 5 answers correctly?

Solution:
a) P(X=3) = f(3) = 10C3 (0.2)3 (0.8)7 = 0.2013
86

b) P(X=7) = f(7) = 10C7 (0.2)7 (0.8)3 = 0.000786.

c) P(X < 3) =

d) P(X ≥ 5) =

Cumulative Binomial Distribution Tables

Notice that the calculations needed in parts (c) and (d) of the previous example are time
consuming. Instead of using the pdf f(x) to solve the problems, the CDF F(x) can be used.
Values for the CDF are found in the Cumulative Binomial Distribution tables at the end of
the notes (Table A).
There are several tables – one for each different value of n. The first column gives the value
of n while the second column gives the possible values that the random variable X can take
on. The top row gives common values of p.

Remember: These tables give cumulative probabilities so situations that involve the
“<”, “>”and “≥” signs must be adjusted so that they are in a form that uses the “≤”
sign i.e. a “less than or equal to” situation.

Examples
1) Suppose X ~ Bin(12 , 0.6). Find the probability that X is less than, or equal to, 5.

Here we want to find F(5) = P(X ≤ 5).


Step 1: Find the table that has n = 12 in the first column.
Step 2: Choose the value x=5 in the second column.
Step 3: Find p = 0.6 in the top row.
The value given at the intersection of the “5 row” and the “0.6 column” is
F(5) = P(X ≤ 5) = 0.1582
87

2) In the multiple choice test example X ~ Bin(10 , 0.2).


Part c: What is the probability that the student chooses fewer than 3 answers
correctly?
P(X < 3) = P(X ≤ 2) = 0.6778
To find this go to the table with n = 10 in the first column. Choose x=2 in the second
column and choose p = 0.2 in the top row. Line up the column and row and the value
is F(2) = P(X ≤ 2) = 0.6778

Part (d): What is the probability that the student chooses at least 5 answers
correctly?
P(X ≥ 5) = 1 – P(X ≤ 4) = 1 – 0.9672 = 0.0328

Mean and standard deviation of a binomial random variable

If X is a binomial random variable with n trials, probability of success p and probability of


failure q, then the mean, variance and standard deviation of X can be calculated by using the
following formulae.
mean = E(X) = µ= np
var(X) = σ2 = npq
standard deviation (X) = npq.

Example

For T = the number of tails when a coin is flipped 3 times, n = 3, p = q = ½ .

Note: A Binomial random variable with n=1 is simply a Bernoulli trial and is sometimes
referred to as a Bernoulli distribution.

Shape of the binomial distribution

A binomial distribution is symmetric if p  q , positively skewed if p  q and negatively


skewed if p  q . These shapes are illustrated in the graphs for n = 20 shown below and on
the following page.
88
89

Hypergeometric distribution: Bernoulli trials where sampling is without


replacement
The folowing expreimental model is sometimes associated with the binomial distribution.

Consider a bowl with N marbles of which Np are blue and Nq red, where p  q  1. If
sampling is done with replacement and drawing a blue marble labeled “success” (red
Np Nq
marble labeled “failure”), then P(success) =  p and P(failure) =  q . If
N N
P( x blue marbles in n draws) is required and sampling is with replacement, the
binomial formula will still apply. If sampling is without replacement, P(success) is no
longer constant (assumption 4 of binomial experiment is violated) and the binomial
formula will no longer apply for calculating the abovementioned probability. In such
a case

The abovementioned distribution is known as the hypergeometric distribution.

Example

A bowl contains 10 blue and 7 red marbles. Four (4) marbles are drawn at random from the
bowl. Calculate the probability of
(a) two
(b) at least 3
blue marbles drawn when sampling is done
1) with replacement.
2) without replacement.

N=17, Np=10, Nq=7 and n=4

2 2
 10   7 
P(X = 2) = 4 C2     = 0.352 .
 17   17 
90

1b) P(X ≥ 3) = P(X = 3) + P(X = 4)

= 0.335 + 0.120
= 0.455.

C 2 7 C2 45  21
2a) P(X = 2) = 10
=  0.397 .
17 C 4 2380

2b)

4.5 Poisson distribution


Poisson random variable:
If random variable X counts the number of events that occur at random in an interval
of time or space, then X is a Poisson random variable. The average number of events
that occur in the time/space interval is denoted by μ.
A short hand way of referring to a Poisson distributed random variable X with
average (mean) rate of occurrence µ, is X ~ Po(µ).

Examples
1) The number of bad cheques presented for daily payment at a bank.
2) The number of road deaths per month.
3) The number of bacteria in a given culture.
4) The number of defects per square meter on metal sheets being manufactured.
5) The number of mistakes per typewritten page.

PDF
The probability that x events occur in time/space is given by
91

Examples

1) A secretary claims an average mistake rate of 1 per page. A sample page is selected
at random and 5 mistakes found. What is the probability of her making 5 or more
mistakes if her claim of 1 mistake per page on average is correct?

Solution:

In this case μ=1 is claimed and X the number of mistakes ≥ 5. If the claim is true,
P(X ≥ 5) = 1 – P(X ≤ 4)
 1 e 1 e 1 e 1 

= 1 – e  e 1
  
 2 ! 3! 4! 
= 1 – 0.9963
= 0.0037.

The above calculation shows that if the claim of 1 mistake per page on average is true,
there is only a 37 in 10 000 chance of getting 5 or more mistakes per page. This remote
chance of 5 or more mistakes when an average of 1 mistake per page is true casts doubt
on whether the claim of 1 mistake per page on average is in fact true.

2) At a particular restaurant 4 plates are broken, on average, each week. What is the
probability that
a) 2 plates are broken next week?
b) at most 4 plates are broken next week?
c) more than 3 plates are broken next week?

Solution:

a) Let X = number of plates broken in a week.


Then X ~ Po(4)

b)
92

c)

Cumulative Poisson Distribution Tables

Notice that the calculations needed in parts (b) and (c) of the previous example are time
consuming. Instead of using the pdf f(x) to solve the problems, the CDF F(x) can be used.
Values for the CDF are found in the Cumulative Poisson Distribution table at the end of the
notes (Table B).
The top row gives some values for µ and the first column gives some values that Poisson
random variable X can take on. The cumulative probabilities F(x) = P(X < x) can be found by
lining up the relevant row and column.

Reminder: As with the Cumulative Binomial Distribution tables, these tables give
cumulative probabilities so situations that involve the “<”, “>” and “≥” signs must be
adjusted so that they are in a form that uses the “≤” sign i.e. a “less than or equal to”
situation.

Example 2
Part (b): Step 1 – Find µ=4 in the top row of the table.
Step 2 – Find x=4 in the first column.
Step 3 – Line up the column and row.
At the intersection of the row is the value F(4) = P(X ≤ 4) = 0.6288

Part (c): First find P(X ≤ 3)


Step 1 – Find µ=4 in the top row.
Step 2 – Find x=3 in the first column.
Step 3 – Line up the column and the row.
At the intersection of the row is the value F(3) = P(X ≤ 3) = 0.4335
So, P(X > 3) = 1- P(X ≤ 3) = 1 – 0.4335 = 0.5665
93

Poisson approximation of binomial distribution

The Poisson random variable can also be seen as an approximation to a binomial random
variable with the number of trials (n) large and the probability of success (p) small such that
the mean μ = np is of moderate size. This approximation is good when n  20 and p  0.05
or n  100 and np  10 .

Example

A life insurance company has found that the probability is 0.000015 that a person aged 40-
50 will die from a certain rare disease. If the company has 100 000 policy holders in this age
group, what is the probability that this company will have to pay out 4 claims or more
because of death from this disease?

Solution:

For the following reasons a binomial distribution with n = 100 000 and p = 0.000015 is
reasonable in this case.

1 A person either dies or not from this disease (two outcomes).

2 The probability of dying from the disease is constant.

3 The death or not from this disease of one person does not affect that of another
person.

The Poisson distribution with µ = 100 000×(0.000015) = 1.5 can be used to approximate this
probability.
P(X ≥ 4) = 1 – P(X ≤ 3)

= 1 – 0.9344
= 0.0656.

Mean and standard deviation of a Poisson random variable

 The mean and variance of the Poisson distribution are given by E(X) = µ and
var(X) = µ.
 In the case of the Poisson approximation to the binomial distribution

E(X) = var(X) = np
standard deviation = np .
94

If the average rate of occurrence of µ is given for a particular time/space interval


length/size, probability calculations can also be carried out for an interval length/size which
is different to the one given.

Example

Calls arrive at switchboard at an average rate of 1 every 15 seconds. What is the probability
of not more than 5 calls arriving during a particular minute?

Solution:

A mean rate of 1 every 15 seconds is equivalent to a mean rate of 4 every minute. Since the
question concerns an interval of 1 minute, µ = 4 (not µ = 1).

You might also like