STA112 Week 2 Class Note
STA112 Week 2 Class Note
1
Introduction
When summarizing large masses of raw data, it is often useful to
distribute the data into classes or categories and to determine the
number of individuals belonging to each class, called the class
frequency.
Definition:
A tabular arrangement of data by classes together with the
corresponding class frequencies is called a frequency
distribution or frequency table. It is a summary of the number of
times (how frequently) each category on a measurement scale
occurs within a given set of measurements.
2
It shows how the measurements are distributed(or spread) across the used
part of the measurement scale. Frequency distributions can be presented in
summary tables, known as frequency tables The unorganized data
collected during investigation is known as raw data.
ASSIGNMENT 2
• Assume that during a sixty- day working period, the cashier made the following
miscounts:
2 0 1 3 5 6 4 3 2 3
1 2 2 0 1 2 1 1 3 2
0 4 1 1 2 0 2 2 4 4
4 1 2 2 0 1 0 0 3 5
5 0 2 1 1 2 2 1 4 3
3 1 0 3 2 2 1 2 3 3
Summarize the raw data by the use of tally marks.
4
2. Grouped Frequency Distributions
In many practical applications, however, the data under consideration are
usually large in volume and of continuous in nature. In such cases, grouping
the data into classes or groups would be the most appropriate thing to do.
EXAMPLE
Suppose a teacher records raw scores of 50 students in a statistics test as
follows:
58 15 81 79 92 58 69 32 45 56
41 85 43 52 61 75 85 69 56 49
57 87 89 49 85 45 69 75 65 61
25 72 67 58 84 60 32 57 69 68
73 42 65 55 74 58 36 78 68 79
Present the data on a frequency table?
5
Frequency Table:.
Classes Tally Frequency
11‐20
21‐30
31‐40
41‐50
51‐60
61‐70
71‐80
81‐90
91‐100
Data organized and summarized as in the above frequency distribution are often called
grouped data.
6
DEFINITION OF SOME TERMS:
7
In practice, the class boundaries are obtained by adding the upper limit of
one class interval to the lower limit of the next-higher class interval and
dividing by 2.
Sometimes, class boundaries are used to symbolize classes. For example,
the various classes in the first column of the last example could be
indicated by:
10.5–20.5, 20.5–30.5, etc.
9
RELATIVE FREQUENCY DISTRIBUTIONS AND PERCENTAGE
DISTRIBUTIONS
The relative frequency of each measurement category in a sample is the
frequency of that category divided by the total frequency. It is symbolized
by : fi
n
The percentage for each category is the percent of the total frequency (n
or N) that is found in that category. This percentage is achieved by
multiplying the relative frequency by 100, which is symbolized for a
sample by: f * 100 %
i
n
fi *100 %
and for a population by: N
10
Example
Calculate relative frequency for the table below:
Weight (kg)
x i Frequency f i
1.0 2
1.1 0
1.2 4
1.3 8
1.4 4
1.5 2
20
21-30 4
With the aid of the table below, draw a frequency table
31-40 6
with class boundaries, class mark, class width and
41-50 7
relative frequency?
51-60 3
11
Cumulative Frequency Distribution
The total frequency of all values less than the upper class boundary of a
given class interval is called the cumulative frequency up to and including
that class interval.
The entry for any score value or class interval is the sum of the frequencies
for that value or that interval plus the frequencies of all lower score values.
Example:
In a survey of monthly rents payable in a certain town, the following sample
data were collected for 48 residential flats in the town
200 250 335 230 270 190 270 490 Present the data on a
255 240 445 290 256 245 310 480 frequency distribution
340 580 280 235 274 220 435 525 table using 150-199,
330 260 310 172 405 295 385 535 200-249 ,... Construct
160 250 190 210 390 230 280 460
cumulative frequency
315 370 345 355 232 170 270 362
and percentage relative
efficiency
12
Exercise
The regulation board of health in a particular state specifies that the
fluoride level must not exceed 1.5 ppm (parts per million). The 25
measurements below represent the fluoride level for a sample of 25 days.
Although, fluoride levels are measured more than once per day, these data
represent the early morning readings for the 25 days sampled: 0.75,
0.86,0.84,0.85, 0.97, 0.94,0.89, 0.84, 0.83, 0.89, 0.88, 0.78, 0.77, 0.76,
0.82, 0.71, 0.92,1.05,0.94 ,0.83, 0.85,0.97,0.93,0.79, 0.81
i. Construct a frequency table for the above data using classes of 0.71
–0.75, 0.76 – 0.80, 0.81 – 0.85etc.
13
Graphical Displays of Data
Data can be represented for more visual impact in form of any of the
following;
1. Pictograms
2. Pie chart
3. Bar charts:
i. Simple bar chart
ii. Multiple bar chart
iii. Component bar chart
4. Histogram
5. Cumulative frequency curve (ogive)
Pictogram
A Pictograph is a way of representing an interested data using images. For
instance, if data is on population the pictogram will contain diagrams of
human beings. If it is about cars, the pictogram will contain cars.
Note: The number of diagrams drawn is usually proportional to the given data.
14
• In the pictogram above:
1. Who read the most book?
2. Who read the least book?
15
16
• Who scored the most goals?
Who scored the fewest goals?
• Which two pupils scored the same number of goals?
• Jessica and David
• How many more goals did Sam score than Will?
17
example
Year 1960 1970 1980 1990
Population 12.6 26.2 38.4 50.1
(in millions)
18
Bar Chart
Bar graph is constructed on a rectangular coordinate
system where by the X axis represents the
independent variables, the Y axis the dependent
variable and rectangle (or bars) show the relationship
between the variables. The frequency of each
measurement value is proportional to the height of the
vertical rectangle above that value.
19
Note the following
20
SIMPLE BAR CHART
• This is a chart where the information is represented by a series of
bars all of the same width. The height or length of each bar
represents the magnitude of the figures. The bar charts may be
drawn vertically or horizontally. This also represents a set of
non-joining bars, each showing just one data value
proportionately. It deals with only one set of data.
21
Example 1
The table below shows the distribution of weight gains (kg) in lambs feeding
on a certain diet over a specified amount of time. Draw a simple bar chart for
the information?
Weight gain Frequency
5 1
6 3
7 2
8 6
9 7
10 5
11 4
22
MULTIPLE/COMPOSITE BAR CHART
• This consists of grouped bars. Each of those bars show a different
characteristic corresponding to a common variate value. The lengths of the
bars are proportional to the magnitudes of the characteristics they represent.
Each of the grouped bars may be coloured for ease of identification. Multiple
bar chart is a good device for visual comparison of two or more kinds of
information.
23
24
Example
• The table below shows the intake through JAMB by the Faculty of
Science of a certain University in three consecutive years. Present
the data on a multiple bar chart.
Department 2002 2003 2004
Micro-biology 43 40 35
Biochemistry 45 35 42
Applied Biology 28 40 28
Physics 33 25 35
Mathematics 35 35 38
Computer Science 40 42 45
Chemistry 37 40 42
Total 261 257 265
25
Example
• Draw a multiple bar diagram for the following data which represented
agricultural production for the period from 2017‐2020.
26
27
COMPONENT BAR CHART
• This is the chart in which each bar is divided into two or
more sections proportional in size to the component parts of
the total quantity being represented by each bar. The
component bar chart also aids visual comparison.
28
29
ASSIGNMENT
• The data below shows the number of students that pass with an
A in a course of a university in Nigeria from 2004 to 2006.
30
PIE CHART
• A pie chart is simply a circle divided into sections. The
circle represents the total of the data being presented and
each section is drawn proportional to its relative size. A
pair of compass and protractor is needed in the
construction of pie chart.
31
Example
• The table below represents the skill classification of the
workforce at two factories. Draw a degree and
percentage pie chart to represent the data.
33
EXAMPLE
Represent the information in the table below on a
histogram.
Class Freq class boundaries
10-14 3 9.5-14.5
15-19 6 14.5-19.5
20-24 10 19.5-24.5
25-29 11 24.5-29.5
30-34 5 29.5-34.5
35-39 3 34.5-39.5
34
ADVANTAGES OF HISTOGRAM
1). They display the comparative frequency occurrence of data items within each
class interval and so show which class interval are the most frequently occurring
and which are the least.
2). They indicate whether the range of values is wide or narrow and whether
most values occur in the middle of the range or whether the frequencies are
more evenly spread.
35
Frequency Polygon
This is a line graph of the class frequency plotted
a g a i n s t t h e c l a s s m a rk . I t c a n b e o b t a i n e d by
connecting the midpoints of the tops of the rectangles
in the histogram. (Note: Class mark or midpoint is the
average of class limits)
36
CUMULATIVE FREQUENCY CURVE (OGIVE)
The graph of a cumulative frequency distribution is called
c u m u l a t ive f re q u e n c y c u r ve o r o g ive c u r ve. I n i t s
construction, each cumulative frequency is plotted against
the upper class boundaries of the class interval.
When the cumulative totals of successive frequencies of a
distribution are plotted against the corresponding class
boundaries then we have a cumulative frequency curve.
37
(also known as ogive). Since cumulative frequencies are
formed by successive additions, the cumulative frequency
for a can never be less than the cumulative frequency of the
preceding class. For this reason the graph of cumulative
frequency curve either increases or remains level, and can
never drop down towards the x-axis. The last cumulative
frequency is the total of the frequencies in the distribution.
38
Examples
1. The number of maids in ten houses
Number of maids 6 1 2 3 4
Number of houses 1 3 4 1 1
Frequency 1 4 12 8 3
39
Solution
40
41
QUARTILES FROM CUMULATIVE FREQ. CURVE
To help read from the ogive, percentage cumulative frequencies are
marked on the vertical axis and the corresponding values are read
from the horizontal axis. These are quartiles and percentiles.
42
(a). The lower quartile or first quartile:
Q1 N 0r 25% of N
(b) The median: 4
Q N 0r 50% of N
(c) The upper quartile or 3rd quartile:
2 2
Q3 3N 0r 75% of N
4
43
(d) Decile:
DN 0r 10% of N
10
(e) Quintile:
Q N 0r 20% of N
(f) Inter‐quartile range= 5
44
Example
• The table below shows the weight of 40 female students in
a school. Form a cumulative frequency table and use it to
draw an ogive.
Weight 118- 125- 132- 139- 146- 153- 160- 167- 174-
(kg) 124 131 138 145 152 159 166 173 180
No of 1 3 7 8 9 5 4 2 1
Students
45
Using your graph, determine the following:
(a). Lower quartile (137.5kg)
(b). Median (146.3kg)
(c). Upper Quartile (155.3kg)
(d). Decile (131.8kg)
(e). Quintile (135.7kg)
(f). Interquartile range (17.8kg)
(g). Semi‐interquartile range. (8.9kg)
46
MEASURES OF CENTRAL TENDENCY OR
LOCATION
• As earlier stated, descriptive statistics aims at describing data by
summarising the values in the data set. One of the ways to achieve
this is to find a single value that will describe the general location
of the data. This single value which is a central point of the
distribution is referred to as measure of central tendency or
location.
47
Measures of central tendency are typical and
representative of a data. All other value in the
distr ibution clusters around the measure of
location. These measures include;
arithmetic mean, median, mode, mode, harmonic
mean and geometric mean.
48
• Measures of partition are measures that divide a distribution into a specified
fraction of the distribution. These measures are also known as fractiles they
includes: median, quartiles, percentiles and deciles.
49
The Arithmetic Mean
It is the usual average of a set of population. i.e. The
equal sharing among all the values in the data set.
It is denoted mathematically as,
n
xi
x i 1
1
n
N
xi
i 1
3
N
50
Example
1. Find the mean of 2g, 4g, 6g, 8g and 10g
2. Calculate the mean of 2.3, 5.4, 0 , 6.2, 7.9, 8.1, 0,
3.4
51
Solution
1. n
xi
x i 1
n
2 4 6 8 10
x
5
30
6g
5
n
2. xi
x i1
n
8
xi
x i1
n
2 .3 5 .4 0 6 .2 7 .9 8 .1 0 3 .4
x
8
3 3 .3
8 52
4 .1 6 3
Arithmetic Mean of an Ungrouped Frequency
Distribution
It is denoted mathematically as n
f x i i
x i 1
f
Where
x i represents ith observation
f i represents ith frequency of each observation
f total num ber of cases
53
Example
The table below indicates the number of children in the
families of twenty teachers in a school. Compute the
arithmetic mean
Number of children 1 2 3 4 5
Number of teachers 4 2 6 5 3
54
Arithmetic Mean : Method of Assumed Mean
This can be used to find the arithmetic mean of any
distribution, whether grouped or ungrouped. It
involves subtracting each value of ‘x’ from a specified
assumed mean which is the mid mark of the class with
the highest frequency
If A is the assumed mean and if the obtained
deviation is represented by d i then,
55
For a frequency distribution, the arithmetic mean is
given by: x
Where : x A
fd
n
x 3 4 5 6
f 2 6 8 4
57
Solution
The highest frequency is 8
The value of x corresponding to this is 5. hence, 5 is the
assumed mean
x f d(x‐5) fd
3 2 ‐2 ‐4
4 6 ‐1 ‐6
5 8 0 0
6 4 1 4
20 ‐6
58
Arithmetic mean: A
fd
n
6
5
20
5 ( 0 .3 )
4 .7
Assignment:
fx
Attempt to solve the above using:
f
59
Properties of the Arithmetic Mean
1.It is unique since there is only one in a set of data
2.It makes use of every value in the data, making it
suitable for further statistical analysis
3.It can sometimes give rise to ridiculous values e.g.
4.67 students
4.It is the most stable and widely used of the measures
of central tendency
60
5.It is unwise to use the mean as average if the distribution
is open ended
6.It represents equal sharing of the items in the distribution
7.The algebraic sum of all deviations from the arithmetic
mean is always zero i.e.
X X 0
61
MEDIAN
It is the middle value in a distribution.
To obtain the median of a data, there’s a need to
arrange the values either in ascending or descending
order and select the middle value(s).
It is easier to select the middle number if the number
of items is odd. If it is even, the median will be the
average of the two middle terms
62
Example
1. Find the median of 6, 7, 2, 4, 9, 0, 3
2. Find the median of 163, 149, 152, 160, 195, 180
63
Solution
1. Rearrange (ascending or descending) the numbers; 0, 2, 3, 4, 6,
7, 9 the middle value is 4.
2. Rearrange; 149, 152, 160, 163, 180 , 195 Here, we have a tie
hence the median is
160 163
1 6 1 .5
2
64
Median of an Ungrouped Frequency Table
Firstly, find the total frequency, add 1 and
divide by two. There's a need to create a
column for cumulative frequency to help locate
the median
65
Example
1. Find the median of the following ungrouped frequency
table
x 2 4 6 8 10
f 15 12 23 6 4
Age 3 5 7 9 10
No of Students 2 3 4 5 6
66
Solution
1. Total frequency is 15+12+23+6+4=60 hence
n 1 60 1
3 0 .5
2 2
x f Cumulative Freq
2 15 15
4 12 27
6 23 50
8 6 56
10 4 60
60
67
From the cumulative frequency, 30.5 falls
under 50 and the corresponding value of x is
6. Therefore, the median is 6
2. Total frequency is 20
n 1 20 1
10.5
2 2
68
x f Cumulative
Freq
3 2 2
5 3 5
7 4 9
9 5 14
10 6 20
20
From the commutative frequency, 10.5 is located under
‘14’ and the corresponding x value is 9.
The median is 9
69
Properties of the Median
1. It is the central observation of a data
2. The median gives the actual value for a set of discrete and
odd items
3. it can be estimated from incomplete data
4. The median always exist
5. The median cant be used for further statistical computation
6. It is unique because there is only one value for median in a
data
70
MODE
It is the highest occurring item in a set of
observation.
When they are two modes it is called bimodal
distribution, if three, it is called trimodal
distribution and when it is more than three it is
called multimodal distribution
71
Example
Find the mode of the following;
i. 2, 5, 2, 3, 7, 1, 5, 6, 5
ii. 1, 6, 7, 6, 8, 4, 1
iii. 93, 72, 24, 43, 67, 93, 24, 43, 72, 67
72
Solution
i. The number that occurred most is 5 since it occurred
thrice
ii. There’s a tie in the number that occurred most. They are; 1
and 6 each occurred twice hence this distribution is
bimodal
iii. This set has no mode since all the items occurs equally
(twice). This indicates that the mode may not exist in some
cases
73
Properties of the Mode
1. It is the most occurring
2. It may or may not exist and if it exists, it may not be
unique
3. It ignores a large part of the data hence not widely
used in research
4. It can be estimated from incomplete data
5. It cannot be used for further statistical analysis
74
Relationship between Mean, Median and Mode
An empirical relationship exists in unimodal
distributions which are not symmetrical in nature. The
76
Solution
1. m e a n - m o d e 3 m e a n - m e d ia n
9 7 . 6 8 - 8 1 . 9 3 3 9 7 . 6 8 - m e d i a n
1 5 . 7 5 3 9 7 . 6 8 - m e d i a n
1 5 .7 5
9 7 .6 8 - m e d ia n
3
5 .2 5 9 7 .6 8 - m e d ia n
m e d ia n 9 7 .6 8 - 5 .2 5
m e d ia n 9 2 .4 3 77
2. m e a n - m o d e 3 m e a n - m e d ia n
1 8 1 - 1 6 0 3 1 8 1 - m e d i a n
2 1 3 1 8 1 - m e d i a n
21
1 8 1 - m e d ia n
3
7 1 8 1 - m e d ia n
m e d ia n 1 8 1 - 7
m e d ia n 1 7 4
78
Weighted Arithmetic Mean, Geometric Mean, and Harmonic Mean
Geometric Mean
This is the nth root of the product of n numbers. If
X 1 , X 2 , X 3 , , X n are observations then, geometric
mean G is :
G n X1X 2X3 X n
83
G n X1X 2 X 3 X n
G n
X1X 2 X 3 X n
T a k e lo g o f b o th s id e s
lo g G n
lo g X 1X 2 X 3 X n
S im p lif y in g w ith la w s o f lo g a r ith m
n lo g G lo g X 1 lo g X 2 lo g X 3 lo g X n
lo g X lo g X lo g X lo g X
lo g G 1 2 3 n
lo g G
lo g X
a
n
G a n ti lo g
lo g X
b
n
84
Either equation a or b can be used to obtain
geometric mean
Example
Find the geometric mean of 6, 8, 10, 16
85
Geometric Mean of a Frequency Distribution
Denoted mathematically by
f 1 lo g X 1 f 2 lo g X f 3 lo g X f 3 lo g X
lo g G 2 3 n
lo g G
f lo g X
f
For an ungrouped data, X i represent
individual observation while f i represents the
corresponding frequency.
For a grouped data, X i represents the mid mark of the class.
86
Example
1. The table below shows the distribution of
the life span of 50 batteries in hours,
calculate the geometric mean
1
H
1 1
N
X
N
1
X
88
Example
1. Calculate the harmonic mean of 6,7,8 and 9
89
The relationship between harmonic mean, arithmetic
mean and geometric mean is
90
MEASURES OF PARTITION
These are measures that divide a distribution into
specified number of parts.
They include quartiles, deciles and percentiles.
91
QUARTILES
When an ordered set of data is divided into four equal parts,
the divisions are called quartiles.
The first or lower quartile, Q 1 is a value that has
1
approximately 25% or 4 of the observations below it and
approximately 75% of the observations above. This is
determined by: 1
4 n 1
92
2 1 of the
The second quartile, Q2 has approximately 50% or 4 2
3
The third or upper quartile, Q 3 , has approximately 75% or 4 of
the observations below its value. it is determined by .
3
4 n 1
93
Examples
1. Given the data below 3, 4, 1, 2, 7, 12, 5 . Find,
i. The lower quartile
ii. The upper quartile
2. Find Q 1 and Q 3 of 1, 5, 3, 6, 9, 8
3. Find Q 1 and Q 3 of 4, 12, 2.2, 14, 23, 10, 16.4, 2, 15,
19.6, 20.6, 8
94
Examples
Rearrange in ascending or descending order
1. Lower quartile-2nd item hence 2
Upper quartile- 6th item hence 7
2. Lower quartile-between 1st & 2nd item hence 2.5
Upper quartile- between 5th & 6th item hence 8.25
3. Lower quartile-between 3rd & 4th item hence 5
Upper quartile- between 9th & 10th item hence 18.8
95
DECILES
When an ordered set of a data is divided into ten equal
parts then it is called deciles.
1 3
First decile is 10 , third decile is10 . Thus the nth
n
decile is 10
96
PERCENTILES
When an ordered set of a data is divided into one hundred equal
parts then it is called percentiles.
The 30th percentile is 30100 of the distribution hence the lower
quartile which is 25% or is known as the 25 th percentile,
1
4 the
median is the 50 t h percentile and the upper quartile 75 t h
percentile.
The methods of computing any fractile is similar to the method
of computing median and the quartiles
97
Examples
Calculate the
i. Third decile
ii. 80th percentile of 1, 5, 3, 6, 9, 8
1. Find Q 1 and Q 3 of 4, 12, 2.2, 14, 23, 10, 16.4, 2, 15, 19.6,
20.6, 8
98
A
99
A
100
A
101
A
102