INSTITUTE OF ACCOUNTANCY
ARUSHA
Lecturer; Laurent L. Lulu.
Module Name Probability and Statistics
MEASURES OF CENTRAL TENDENCY.
Measures of Central Tendency
• However, I can provide a general overview of the measures of central tendency and
dispersion that are commonly used in the field of cybersecurity and ICT.
• Measures of central tendency are statistics that describe the typical or average value of
a data set. These measures can be used to determine the most common value or the
value around which the data tends to cluster. Common measures of central tendency
include:
1. Mean: The mean is calculated by adding up all of the values in a data set and
dividing by the total number of values.
>The mean can be useful for understanding the average value of a particular attribute
in a cybersecurity context, such as the average number of attempted cyber-attacks per
day on a particular network of average is nicely given in the following definitions.
Cont…
2. Median: The median is the middle value in a data set when the
values are arranged in order. This measure is useful when there are
extreme values or outliers in the data set that may distort the mean.
For example, if a company experiences a large-scale cyber-attack
that results in a significant spike in the number of attempted attacks,
the median may provide a more accurate representation of the
typical number of attacks per day.
3. Mode: The mode is the most common value in a data set. In the
context of cybersecurity, this measure may be useful for identifying
the most common type of attack or the most frequently targeted
system.
Characteristics for a good
Measures of Central Tendency
1. It should be rigidly defined.
2. It should be easy to understand and
compute.
3. It should be based on all items in the
data.
4. Its definition shall be in the form of a
mathematical
formula.
Cont…
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7. It should be capable of being used in further statistical
computations or processing.
Besides the above requisites, measures of central tendency
should represent maximum characteristics of the data, its value
should be nearest to the most items of the given series.
Arithmetic mean or mean
• Arithmetic mean or simply the mean of a variable is
defined as the sum of the observations divided by the
number of observations. If the variable x assumes n
values x1, x2 … xn then the mean, x, is given by
Short-Cut method/Assumed mean
• Under this method an assumed or an arbitrary average
• (indicated by A) is used as the basis of calculation of
deviations from individual values. The formula is
cont….
•Example 2:
•A student’ s marks in 5 subjects
are 75, 68, 80, 92, 56. Find his
average mark.
Cont…
Cont…
Cont…
Merits and demerits of Arithmetic mean
Merits:
1. It is rigidly defined.
2. It is easy to understand and easy to calculate.
3. If the number of items is sufficiently large, it is
more
accurate and more reliable.
4. It is a calculated value and is not based on its
position in the
series.
Cont…
5. It is possible to calculate even if some of the
details of the
data are lacking.
6. Of all averages, it is affected least by
fluctuations of
sampling.
7. It provides a good basis for comparison.
Cont…
• Demerits:
1. It cannot be obtained by inspection nor
located through a frequency graph.
2. It cannot be in the study of qualitative
phenomena not capable of numerical
measurement i.e. Intelligence, beauty,
honesty etc.,
3. It can ignore any single item only at the
risk of losing its accuracy.
Cont…
4. It is affected very much by extreme values.
5. It cannot be calculated for open-end
classes.
6. It may lead to fallacious conclusions, if the
details of the
data from which it is computed are not given.
Harmonic mean (H.M) :
Merits of H.M
1. It is rigidly defined.
2. It is defined on all observations.
3. It is amenable to further algebraic treatment.
4. It is the most suitable average when it is desired to
give greater weight to smaller observations and less
weight to the larger ones.
Demerits of H.M
1. It is not easily understood.
2. It is difficult to compute.
3. It is only a summary figure and may not be the actual item
in the series
4. It gives greater importance to small items and is therefore,
useful only when small items have to be given greater
weightage.
Geometric mean :
The geometric mean of a series containing n observations is the nth
root of the product of the values. If x1,x2…, xn are observations then
Cont…
Merits and demerits of Geometric mean
Merits:
1. It is rigidly defined
2. It is based on all items
3. It is very suitable for averaging ratios,
rates and percentages
4. It is capable of further mathematical
treatment.
5. Unlike AM, it is not affected much by the
presence of extreme values
Demerits:
1. It cannot be used when the values are negative or if
any of the observations is zero
2. It is difficult to calculate particularly when the items
are very large or when there is a frequency distribution.
3. It brings out the property of the ratio of the change
and not the absolute difference of change as the case
in arithmetic mean.
4. The GM may not be the actual value of the series
Combined mean
Cont…
•Example
•Find the combined mean for the data
given below
n1 = 20 , x1 = 4 , n2 = 30, x2 = 3
Median
The median is that value of the variaty which divides the
group into two equal parts, one part comprising all values
greater, and the other, all values less than median.
• Ungrouped or Raw data : Arrange the given values in the
increasing or decreasing order. If the number of values are
odd, median is the middle value. If the number of values are
even, median is the mean of middle two values.
• By formula
Example
When even number of values are given. Find
median for the following data 5, 8, 12, 30, 18,
10, 2, 22. using a formula
Median for the Grouped Data.
• In a grouped distribution, values are associated with frequencies.
• Grouping can be in the form of a discrete frequency distribution
or a continuous frequency distribution.
• Whatever may be the type of distribution , cumulative
frequencies have to be calculated to know the total number of
items.
• Cumulative frequency : (cf)
• Cumulative frequency of each class is the sum of the frequency
of the class and the frequencies of the pervious classes, ie adding
the frequencies successively, so that the last cumulative
frequency gives the total number of items
Median for Discrete data
Cont…
• Example
The following data pertaining to the number of members in a family.
Find median size of the family.
Cont…
Cont…
• Where
• l = Lower limit of the median class
• m = cumulative frequency preceding the median
• c = width of the median class
• f =frequency in the median class.
• N=Total frequency.
• Note :
• If the class intervals are given in inclusive type convert
• them into exclusive type and call it as true class interval and
• consider lower limit in this.
Cont…
• Example
Merits and Demerits of Median
Merits
1. Median is not influenced by extreme values because it is a
positional average.
2. Median can be calculated in case of distribution with
openend intervals.
3. Median can be located even if the data are incomplete.
4. Median can be located even for qualitative factors such as
ability, honesty etc.
Demerits
1.A slight change in the series may bring drastic change in median
value.
2. In case of even number of items or continuous series, median is an
estimated value other than any value in the series.
3. It is not suitable for further mathematical treatment except its use in
mean deviation.
4. It is not taken into account all the observations.
Mode
The mode refers to that value in a distribution, which occur most
frequently. It is an actual value, which has the highest concentration of
items in and around it.
According to Croxton and Cowden “ The mode of a distribution is
the value at the point around which the items tend to be most heavily
concentrated. It may be regarded at the most typical of a series of
values”.
It shows the centre of concentration of the frequency in around a
given value.
Therefore, where the purpose is to know the point of the highest
concentration it is preferred
• Its importance is very great in marketing studies where a manager is
interested in knowing about the size, which has the highest
concentration of items. For example, in placing an order for shoes or
ready-made garments the modal size helps because this sizes and other
sizes around in common demand.
Computation of the mode
• Ungrouped or Raw Data:
• For ungrouped data or a series of individual
observations, mode is often found by mere inspection.
• Example.
2 , 7, 10, 15, 10, 17, 8, 10, 2
Mode =10
In some cases the mode may be absent while in some
cases there may be more than one mode.
Mode for Grouped Data
• For Discrete distribution, see the highest frequency and
corresponding value of X is mode.
• Continuous distribution : See the highest frequency then the
corresponding value of class interval is called the modal class.
Then apply the formula.
Cont…
Cont…
Example.
Determination of Modal class
For a frequency distribution modal class
corresponds to the maximum frequency. But in
any one (or more) of the following cases.
If the maximum frequency is repeated
If the maximum frequency occurs in the
beginning or at the end of the distribution
If there are irregularities in the distribution,
the modal class is determined by the method
of grouping.
Cont…
• Steps for Calculation:
• We prepare a grouping table with 6 columns
1. In column I, we write down the given frequencies.
2. Column II is obtained by combining the frequencies two by two.
3. Leave the 1st frequency and combine the remaining frequencies two
by two and write in column III
4. Column IV is obtained by combining the frequencies three by three.
5. Leave the 1st frequency and combine the remaining frequencies three
by three and write in column V
6. Leave the 1st and 2nd frequencies and combine the remaining
frequencies three by three and write in column VI
• Mark the highest frequency in each column. Then form an analysis table
to find the modal class. After finding the modal class use the formula to
calculate the modal value.
Cont…
Example.
Cont…
Merits and Demerits of Mode.
• Merits
1. It is easy to calculate and in some cases it can be located mere inspection
2. Mode is not at all affected by extreme values.
3. It can be calculated for open-end classes.
4. It is usually an actual value of an important part of the series.
5. In some circumstances it is the best representative of data.
Demerits
1. It is not based on all observations.
2. It is not capable of further mathematical treatment.
3. Mode is ill-defined generally, it is not possible to find mode in some
cases.
4. As compared with mean, mode is affected to a great extent, by sampling
fluctuations.
5. It is unsuitable in cases where relative importance of items has to be considered.
EMPIRICAL RELATIONSHIP BETWEEN AVERAGES
• In a symmetrical distribution the three simple averages
mean = median = mode. For a moderately
asymmetrical distribution, the relationship between
them are brought by Prof. Karl Pearson as
Mode = 3median - 2mean.
Example. If the mean and median of a moderately
asymmetrical series are 26.8 and 27.9 respectively, what
would be its most probable mode?
Example. In a moderately asymmetrical distribution the
values of mode and mean are 32.1 and 35.4 respectively.
Find the median value.