0% found this document useful (0 votes)
54 views

Stats Notes

Measures of central tendency are used to describe data sets with a single number. There are three main types: mean, median, and mode. [1] The mean is the average value found by adding all values and dividing by the number of values. [2] The median is the middle value of the data set when arranged in order. [3] The mode is the value that occurs most frequently in the data set. Each measure of central tendency has strengths and weaknesses in accurately representing different types of data distributions.

Uploaded by

Divyanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Stats Notes

Measures of central tendency are used to describe data sets with a single number. There are three main types: mean, median, and mode. [1] The mean is the average value found by adding all values and dividing by the number of values. [2] The median is the middle value of the data set when arranged in order. [3] The mode is the value that occurs most frequently in the data set. Each measure of central tendency has strengths and weaknesses in accurately representing different types of data distributions.

Uploaded by

Divyanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MEASURES OF CENTRAL TENDENCY

Measures of Central Tendency:

In the study of a population with respect to one in which we are interested we may get a large number
of observations. It is not possible to grasp any idea about the characteristic when we look at all the
observations. So it is better to get one number for one group. That number must be a good
representative one for all the observations to give a clear picture of that characteristic. Such
representative number can be a central value for all these observations. This central value is called a
measure of central tendency or an average or a measure of locations. There are five averages. Among
them mean, median and mode are called simple averages and the other two averages geometric mean
and harmonic mean are called special averages.

The meaning of average is nicely given in the following definitions. “A measure of central tendency is a
typical value around which other figures congregate.”

“An average stands for the whole group of which it forms a part yet represents the whole.”

“One of the most widely used set of summary figures is known as measures of location.”

Characteristics for a good or an ideal average :

The following properties should possess for an ideal average.

1. It should be rigidly defined.

2. It should be easy to understand and compute.

3. It should be based on all items in the data.

4. Its definition shall be in the form of a mathematical

formula.

5. It should be capable of further algebraic treatment.

6. It should have sampling stability.

7. It should be capable of being used in further statistical computations or processing.

Besides the above requisites, a good average should represent maximum characteristics of the data, its
value should be nearest to the most items of the given series.

Arithmetic mean or mean :

Arithmetic mean or simply the mean of a variable is defined as the sum of the observations divided by
the number of observations. If the variable x assumes n values x1, x2 …xn then the mean, x, is given by

x x1 x2 x3 .... xn n

1n xi
ni

This formula is for the ungrouped or raw data.

= 31

Merits and demerits of Arithmetic mean :

Merits:

1. It is rigidly defined.

2. It is easy to understand and easy to calculate.

3. If the number of items is sufficiently large, it is more accurate and more reliable.

4. It is a calculated value and is not based on its position in the series.

5. It is possible to calculate even if some of the details of the data are lacking.

6. Of all averages, it is affected least by fluctuations of sampling.

7. It provides a good basis for comparison.

Demerits:

1. It cannot be obtained by inspection nor located through a frequency graph.

2. It cannot be in the study of qualitative phenomena not capable of numerical measurement i.e.
Intelligence, beauty, honesty etc.,

3. It can ignore any single item only at the risk of losing its accuracy.

4. It is affected very much by extreme values.

5. It cannot be calculated for open-end classes.

6. It may lead to fallacious conclusions, if the details of the data from which it is computed are not
given.

Weighted Arithmetic mean :


For calculating simple mean, we suppose that all the values or the sizes of items in the distribution have
equal importance. But, in practical life this may not be so. In case some items are more important than
others, a simple average computed is not representative of the distribution. Proper weightage has to be
given to the various items. For example, to have an idea of the change in cost of living of a certain group
of persons, the simple average of the prices of the commodities consumed by them will not do because
all the commodities are not equally important, e.g rice, wheat and pulses are more important than tea,
confectionery etc., It is the weighted arithmetic average which helps in finding out the average value of
the series after giving proper weight to each group.
Definition:

The average whose component items are being multiplied by certain values known as “weights” and
the aggregate of the multiplied results are being divided by the total sum of their “weight”.

If x1, x2…xn be the values of a variable x with respective weights of w1, w2…wn assigned to them,
then Weighted A.M = xw w1 x1 w2 x2 .... wn xn = wi xi

w1 w2 .... wn wi Uses of the weighted mean:

Weighted arithmetic mean is used in:

a. Construction of index numbers.

b. Comparison of results of two or more universities where number of students differ.

c. Computation of standardized death and birth rates.

Merits of H.M :

1. It is rigidly defined.

2. It is defined on all observations.

3. It is amenable to further algebraic treatment.

4. It is the most suitable average when it is desired to give greater weight to smaller observations
and less weight to the larger ones.

Demerits of H.M :

1. It is not easily understood.

2. It is difficult to compute.

3. It is only a summary figure and may not be the actual item in the series
4. It gives greater importance to small items and is therefore, useful only when small items have to
be given greater weightage. Merits of Geometric mean :

1. It is rigidly defined

2. It is based on all items

3. It is very suitable for averaging ratios, rates and percentages

4. It is capable of further mathematical treatment.

5. Unlike AM, it is not affected much by the presence of extreme values

Demerits of Geometric mean:

1. It cannot be used when the values are negative or if any of the observations is zero
2. It is difficult to calculate particularly when the items are very large or when there is a frequency
distribution.

3. It brings out the property of the ratio of the change and not the absolute difference of change as
the case in arithmetic mean.

4. The GM may not be the actual value of the series.

Therefore the level of knowledge in Statistics is higher than that in Accountancy.

Grouped Data:

In a grouped distribution, values are associated with frequencies. Grouping can be in the form of a
discrete frequency distribution or a continuous frequency distribution. Whatever may be the type of
distribution , cumulative frequencies have to be calculated to know the total number of items.

Cumulative frequency : (cf)

Cumulative frequency of each class is the sum of the frequency of the class and the frequencies of the
pervious classes, ie adding the frequencies successively, so that the last cumulative frequency gives the
total number of items.

Discrete Series:

Positional Averages:

These averages are based on the position of the given observation in a series, arranged in an ascending
or descending order. The magnitude or the size of the values does matter as was in the case of
arithmetic mean. It is because of the basic difference

that the median and mode are called the positional measures of an average.

Median :

The median is that value of the variate which divides the group into two equal parts, one part
comprising all values greater, and the other, all values less than median.

Ungrouped or Raw data :

Arrange the given values in the increasing or decreasing order. If the number of values are odd, median
is the middle value .If the number of values are even, median is the mean of middle two values.

Merits of Median :

1. Median is not influenced by extreme values because it is a positional average.

2. Median can be calculated in case of distribution with open- end intervals.

3. Median can be located even if the data are incomplete.

4. Median can be located even for qualitative factors such as ability, honesty etc.
Demerits of Median :

1. A slight change in the series may bring drastic change in median value.

2. In case of even number of items or continuous series, median is an estimated value other than
any value in the series.

3. It is not suitable for further mathematical treatment except its use in mean deviation.

4. It is not taken into account all the observations. Quartiles :

The quartiles divide the distribution in four parts. There are three quartiles. The second quartile divides
the distribution into two halves and therefore is the same as the median. The first (lower) quartile (Q1)
marks off the first one-fourth, the third (upper) quartile (Q3) marks off the three-fourth.

Mode :

The mode refers to that value in a distribution, which

occur most frequently. It is an actual value, which has the highest concentration of items in and around
it.

According to Croxton and Cowden “ The mode of a

distribution is the value at the point around which the items tend to be most heavily concentrated. It
may be regarded at the most typical of a series of values”.

It shows the centre of concentration of the frequency in around a given value. Therefore, where the
purpose is to know the point of the highest concentration it is preferred. It is, thus, a positional
measure.

Its importance is very great in marketing studies where a manager is interested in knowing about the
size, which has the highest concentration of items. For example, in placing an order for shoes or ready-
made garments the modal size helps because this sizes and other sizes around in common demand.

Merits of Mode:

1. It is easy to calculate and in some cases it can be located mere inspection

2. Mode is not at all affected by extreme values.

3. It can be calculated for open-end classes.

4. It is usually an actual value of an important part of the series.

5. In some circumstances it is the best representative of data.

Demerits of mode:

1. It is not based on all observations.

2. It is not capable of further mathematical treatment.


3. Mode is ill-defined generally, it is not possible to find mode in some cases.

4. As compared with mean, mode is affected to a great extent, by sampling fluctuations.

5. It is unsuitable in cases where relative importance of items has to be considered.

MEASURES OF DISPERSION –

SKEWNESS AND KURTOSIS

Introduction :
The measure of central tendency serve to locate the center of the distribution, but they do not reveal
how the items are spread out on either side of the center. This characteristic of a frequency distribution
is commonly referred to as dispersion. In a series all the items are not equal. There is difference or
variation among the values. The degree of variation is evaluated by various measures of dispersion.
Small dispersion indicates high uniformity of the items, while large dispersion indicates less uniformity.
For example consider the following marks of two students.

Student I Student II

68 85

75 90

65 80

67 25

70 65

Both have got a total of 345 and an average of 69 each.

The fact is that the second student has failed in one paper. When the averages alone are considered,
the two students are equal. But first student has less variation than second student. Less variation is a
desirable characteristic.

Characteristics of a good measure of dispersion:

An ideal measure of dispersion is expected to possess the following properties

1. It should be rigidly defined

2. It should be based on all the items.

3. It should not be unduly affected by extreme items.


4. It should lend itself for algebraic manipulation.

5. It should be simple to understand and easy to

calculate

Absolute and Relative Measures :

There are two kinds of measures of dispersion, namely 1.Absolute measure of dispersion

2.Relative measure of dispersion.

Absolute measure of dispersion indicates the amount of variation in a set of values in terms of units of
observations. For example, when rainfalls on different days are available in mm, any absolute measure
of dispersion gives the variation in rainfall in mm. On the other hand relative measures of dispersion are
free from the units of measurements of the observations. They are pure numbers. They are used to
compare the variation in two or more sets, which are having different units of measurements of
observations.

The various absolute and relative measures of dispersion are listed below.

Absolute measure Relative measure

1. Range 1.Co-efficient of Range

2. Quartile deviation 2.Co-efficient of Quartile deviation

3.Mean deviation 3. Co-efficient of Mean deviation

4.Standard deviation 4.Co-efficient of variation Range and coefficient of Range:

Range:

This is the simplest possible measure of dispersion and is defined as the difference between the largest
and smallest values of the variable.

In symbols, Range = L – S.

Where L = Largest value.

S = Smallest value.

Merits and Demerits of Range :

Merits:

1. It is simple to understand.

2. It is easy to calculate.

3. In certain types of problems like quality control, weather forecasts, share price analysis, et c.,
range is most widely used.
Demerits:

1. It is very much affected by the extreme items.

2. It is based on only two extreme observations.

3. It cannot be calculated from open-end class intervals.

4. It is not suitable for mathematical treatment.

5. It is a very rarely used measure.

Quartile Deviation and Co efficient of Quartile Deviation :

Quartile Deviation ( Q.D) :

Definition: Quartile Deviation is half of the difference between the first and third quartiles. Hence, it is
called Semi Inter Quartile Range.

In Symbols, Q . D = Q3 Q1 . Among the quartiles Q1, Q2 2

and Q3, the range Q3 Q1 is called inter quartile range and

Q3 Q1 , Semi inter quartile range.

Co-efficient of Quartile Deviation :

Co-efficient of Q.D = Q3 Q1

Q3 Q1

Merits and Demerits of Quartile Deviation Merits

1. It is Simple to understand and easy to calculate

2. It is not affected by extreme values.

3. It can be calculated for data with open end classes also. Demerits:

1. It is not based on all the items. It is based on two positional values Q1 and Q3 and ignores the
extreme 50% of the items

2. It is not amenable to further mathematical treatment.

3. It is affected by sampling fluctuations.

Mean Deviation and Coefficient of Mean Deviation:

Mean Deviation:
The range and quartile deviation are not based on all observations. They are positional measures of
dispersion. They do not show any scatter of the observations from an average. The mean deviation is
measure of dispersion based on all items in a distribution.

Definition:

Mean deviation is the arithmetic mean of the deviations of a series computed from any measure of
central tendency; i.e., the mean, median or mode, all the deviations are taken as positive i.e., signs are
ignored. According to Clark and Schekade,

“Average deviation is the average amount scatter of the items in a distribution from either the mean or
the median, ignoring the signs of the deviations”.

We usually compute mean deviation about any one of the three averages mean, median or mode. Some
times mode may be ill defined and as such mean deviation is computed from mean and median.
Median is preferred as a choice between mean and median. But in general practice and due to wide
applications of mean, the mean deviation is generally computed from mean. M.D can be used to denote
mean deviation.

Coefficient of mean deviation:

Mean deviation calculated by any measure of central tendency is an absolute measure. For the purpose
of comparing variation among different series, a relative mean deviation is required. The relative mean
deviation is obtained by dividing the mean deviation by the average used for calculating mean deviation.

Mean deviation

Coefficient of mean deviation: =

Mean or Median or Mode

If the result is desired in percentage, the coefficient of mean

Mean deviation deviation = 100

Mean or Median or Mode

Merits and Demerits of M.D :

Merits:

1. It is simple to understand and easy to compute.

2. It is rigidly defined.

3. It is based on all items of the series.

4. It is not much affected by the fluctuations of sampling.

5. It is less affected by the extreme items.

6. It is flexible, because it can be calculated from any average.


7. It is better measure of comparison. Demerits:

1. It is not a very accurate measure of dispersion.

2. It is not suitable for further mathematical calculation.

3. It is rarely used. It is not as popular as standard deviation.

4. Algebraic positive and negative signs are ignored. It is mathematically unsound and illogical.

Standard Deviation and Coefficient of variation:

Standard Deviation :

Karl Pearson introduced the concept of standard deviation in 1893. It is the most important measure of
dispersion and is widely used in many statistical formulae. Standard deviation is also called Root-Mean
Square Deviation. The reason is that it is the square–root of the mean of the squared deviation from the
arithmetic mean. It provides accurate result. Square of standard deviation is called Variance.

Definition:

It is defined as the positive square-root of the arithmetic mean of the Square of the deviations of the
given observation from their arithmetic mean.

The standard deviation is denoted by the Greek letter (sigma)

Calculation of Standard deviation-Individual Series :

There are two methods of calculating Standard deviation in an individual series.

a) Deviations taken from Actual mean

b) Deviation taken from Assumed mean

a) Deviation taken from Actual mean:

This method is adopted when the mean is a whole number. Steps:

1. Find out the actual mean of the series ( x )

2. Find out the deviation of each value from the mean

3. Square the deviations and take the total of squared deviations x2

2 x2

4. Divide the total ( x ) by the number of observation n

x2

The square root of n is standard deviation.


x2 (x x)2

Thus = n or n

b) Deviations taken from assumed mean:

This method is adopted when the arithmetic mean is fractional value.

Taking deviations from fractional value would be a very difficult and tedious task. To save time and
labour, We apply short –cut method; deviations are taken from an assumed mean. The formula is:

d2 d 2

N N

Where d-stands for the deviation from assumed mean = (X-A) Steps:

1. Assume any one of the item in the series as an average (A)

2. Find out the deviations from the assumed mean; i.e., X-A denoted by d and also the total of the
deviations d

3. Square the deviations; i.e., d2 and add up the squares of deviations, i.e, d2

4. Then substitute the values in the following formula:

d2 d 2

= n n

Note: We can also use the simplified formula for standard deviation.

1 2o n d2 d n

For the frequency distribution

o c N fd 2 fd 2

Calculation of standard deviation:

Discrete Series:

There are three methods for calculating standard deviation in discrete series:

(a) Actual mean methods

(b) Assumed mean method (c) Step-deviation method. (a) Actual mean method:
Steps:

1. Calculate the mean of the series.

2. Find deviations for various items from the means i.e.,

x- x = d.

3. Square the deviations (= d2 ) and multiply by the respective frequencies(f) we get fd2

4. Total to product ( fd2 ) Then apply the formula: fd2

f
If the actual mean in fractions, the calculation takes lot of time and labour; and as such this method is
rarely used in practice.

(b) Assumed mean method:

Here deviation are taken not from an actual mean but from an assumed mean. Also this method is used,
if the given variable values are not in equal intervals.

Steps:

1. Assume any one of the items in the series as an assumed mean and denoted by A.

2. Find out the deviations from assumed mean, i.e, X-A and denote it by d.

3. Multiply these deviations by the respective frequencies and get the fd

4. Square the deviations (d2 ).

5. Multiply the squared deviations (d2) by the respective frequencies (f) and get fd2.

6. Substitute the values in the following formula: fd2 fd 2

f f

Where d = X A , N = f.

Merits and Demerits of Standard Deviation:

Merits:

1. It is rigidly defined and its value is always definite and based on all the observations and the
actual signs of deviations are used.

2. As it is based on arithmetic mean, it has all the merits of arithmetic mean.

3. It is the most important and widely used measure of dispersion.


4. It is possible for further algebraic treatment.

5. It is less affected by the fluctuations of sampling and hence stable.

6. It is the basis for measuring the coefficient of correlation and sampling.

Demerits:

1. It is not easy to understand and it is difficult to calculate.

2. It gives more weight to extreme values because the values are squared up.

3. As it is an absolute measure of variability, it cannot be used for the purpose of comparison.

Coefficient of Variation :

The Standard deviation is an absolute measure of dispersion. It is expressed in terms of units in which
the original figures are collected and stated. The standard deviation of heights of students cannot be
compared with the standard deviation of weights of students, as both are expressed in different units,
i.e heights in centimeter and weights in kilograms. Therefore the standard deviation must be converted
into a relative measure of dispersion for the purpose of comparison. The relative measure is known as
the coefficient of variation.
The coefficient of variation is obtained by dividing the standard deviation by the mean and multiply it
by 100. symbolically,

Coefficient of variation (C.V) = 100 X

If we want to compare the variability of two or more series, we can use C.V. The series or groups of data
for which the C.V. is greater indicate that the group is more variable, less stable, less uniform, less
consistent or less homogeneous. If the C.V. is less, it indicates that the group is less variable, more
stable, more uniform, more consistent or more homogeneous.

Skewness:

Meaning:

Skewness means ‘ lack of symmetry’ . We study skewness to have an idea about the shape of the curve
which we can draw with the help of the given data.If in a distribution mean = median = mode, then that
distribution is known as symmetrical distribution. If in a distribution mean median mode , then it is
not a symmetrical distribution and it is called a skewed distribution and such a distribution could either
be positively skewed or negatively skewed.

a) Symmetrical distribution:
Mean = Median = Mode

It is clear from the above diagram that in a symmetrical distribution the values of mean, median and
mode coincide. The spread of the frequencies is the same on both sides of the center point of the curve.

Mode Median Mean

It is clear from the above diagram, in a positively skewed distribution, the value of the mean is maximum
and that of the mode is least, the median lies in between the two. In the positively skewed distribution
the frequencies are spread out over a greater range of values on the right hand side than they are on
the left hand side.

c) Negatively skewed distribution:

Mean Median Mode

It is clear from the above diagram, in a negatively skewed distribution, the value of the mode is
maximum and that of the mean is least. The median lies in between the two. In the negatively skewed
distribution the frequencies are spread out over a greater range of values on the left hand side than
they are on the right hand side.

Measures of skewness:

The important measures of skewness are

(i) Karl – Pearason’ s coefficient of skewness

(ii) Bowley’ s coefficient of skewness

(iii)Measure of skewness based on moments Karl – Pearson’ s Coefficient of skewness:

According to Karl – Pearson, the absolute measure of skewness = mean – mode. This measure is not
suitable for making valid comparison of the skewness in two or more distributions because the unit of
measurement may be different in different series. To avoid this difficulty use relative measure of
skewness called Karl – Pearson’ s coefficient of skewness given by:

Mean - Mode

Karl – Pearson’ s Coefficient Skewness =

S.D.

In case of mode is ill – defined, the coefficient can be determined by the formula:

3(Mean - Median)

Coefficient of skewness =
.2 Bowley’ s Coefficient of skewness:

In Karl – Pearson’ s method of measuring skewness the whole of the series is needed. Prof. Bowley has
suggested a formula based on relative position of quartiles. In a symmetrical distribution, the quartiles
are equidistant from the value of the median; ie.,

Median – Q1 = Q3 – Median. But in a skewed distribution, the quartiles will not be equidistant from the
median. Hence Bowley has suggested the following formula:

Bowley’ s Coefficient of skewness (sk) = Q3 Q1 2 Median

Q3 Q1

Kurtosis:

The expression ‘ Kurtosis’ is used to describe the peakedness of a curve.

The three measures – central tendency, dispersion and skewness describe the characteristics of
frequency distributions. But these studies will not give us a clear picture of the characteristics of a
distribution.

As far as the measurement of shape is concerned, we have two characteristics – skewness which refers
to asymmetry of a series and kurtosis which measures the peakedness of a normal curve. All the
frequency curves expose different degrees of flatness or peakedness. This characteristic of frequency
curve is termed as kurtosis. Measure of kurtosis denote the shape of top of a frequency curve. Measure
of kurtosis tell us the extent to which a distribution is more peaked or more flat topped than the normal
curve, which is symmetrical and bell-shaped, is designated as Mesokurtic. If a curve is relatively more
narrow and peaked at the top, it is designated as Leptokurtic. If the frequency curve is more flat than
normal curve, it is designated as platykurtic.

Measure of Kurtosis:

The measure of kurtosis of a frequency distribution based moments is denoted by 2 and is given by

If 2 =3, the distribution is said to be normal and the curve is mesokurtic.

If 2 >3, the distribution is said to be more peaked and the curve is leptokurtic.

If 2< 3, the distribution is said to be flat topped and the curve is platykurtic.

Measure of Kurtosis:

The measure of kurtosis of a frequency distribution based moments is denoted by 2 and is given by
If 2 =3, the distribution is said to be normal and the curve is mesokurtic.

If 2 >3, the distribution is said to be more peaked and the curve is leptokurtic.

If 2< 3, the distribution is said to be flat topped and the curve is platykurtic.

You might also like