0% found this document useful (0 votes)
12 views8 pages

When you have a huge dataset

Uploaded by

Thajes Santy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

When you have a huge dataset

Uploaded by

Thajes Santy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

When you have a huge dataset, it will be convenient if, instead of looking at it and trying to figure it out,

you just got numbers that give you a summary of important measures of that dataset. Luckily, all this is
possible with the help of a concept of statistics called Measures of Central Tendency, that includes the
very common terms of mean, median and mode.

The most fundamental terms for analyzing data through statistics are mean, median, and mode. In this
tutorial on the Measures of Central Tendency, you will look at various terms and try to understand
mean, median, and mode with definition, formulae, and solved examples.

Of mean, median and mode, let’s first look to understand various types of mean. We start with
arithmetic mean;

What Is Arithmetic Mean?

In statistics, Arithmetic Mean is the average of all data values which you work with. Mean is used to find
the average value around which your data values range.

Generally, when working with data, you may want to know the average data value. This will give you a
term that incorporates every data value from the dataset. This also helps produce a term that has
minimum error out of all terms in the data set. Hence, you can minimize the individual error occurring at
any data point. The mean includes every data value in its calculation and gives us a cumulative term that
sums up the dataset well.

Figure 1: Arithmetic Mean

To find the mean, all you have to do is add up all the values in your data and then divide it by the total
number of data values. Consider n terms X_1, X_2, X_3,………… X_n. The mean is the total sum of terms
by the number of terms.

Figure 2: Arithmetic Mean formula

Now, you will understand mean with the help of an example. Consider a class whose students have
obtained the following marks out of 50 in mathematics :
Figure 3: Class marks data

You can see that there are 12 data points. So all you have to do is add up each value and divide the
result by 12, as shown below :

Figure 4: Class marks mean

Hence, you get the mean as 37. This means that, on average, a student belonging to the above class will
score 37 out of 50 in mathematics.

Next in this mean, median, and mode tutorial, we move on to understanding about median.

Your Data Analytics Career is Around The Corner!

Data Analyst Master’s ProgramEXPLORE PROGRAM

What Is Median?

Median refers to the middle value of your data. To find the median, you first sort the data in either
ascending or descending order and then find the numerical value present in the middle of your data.

The median refers to the middle value of your data. You can use the median to figure out the point
around which your data is centered. It divides the data into two halves and has the same number of data
points above and below.

The median is especially useful when you have skewed data. That is, it has high data distribution
towards one side. In this case, the average wouldn't give you a fair mid-value but would lean more
towards the higher values. In this case, you can use the middle data point as the central point instead.

Consider n terms X_1, X_2, X_3,………… X_n. The basic formula for the median is by dividing the total
number of observations by 2. This works fine when you have an odd number of terms because you will
have one middle term and the same number of terms above and below. For an even number of terms,
consider the two middle terms and find their average.

Figure 5: Median Formula

Now, use the same example of a class of 12 students and their marks in mathematics and find the
median of this data.

Figure 6: Class marks

To find the middle term, you first have to sort the data or arrange the data in ascending or descending
order. This ensures that consecutive terms are next to each other.

Figure 7: Sorted class marks

You can see that we have 12 data points, so use the median formula for even numbers.

Figure 8: Class marks median

So, the middle term in the range of marks is 37. This means that the other marks lie in a frequency range
of around 37.

We now come to the last of the mean, median, and mode trio - mode.

What Is Mode?
The Mode refers to the most frequently occurring value in your data. You find the frequency of
occurrence of each number and the number with the highest frequency is your mode. If there are no
recurring numbers, then there is no mode in the data.

Using the mode, you can find the most commonly occurring point in your data. This is helpful when you
have to find the central tendency of categorical values, like the flavor of the most popular chip sold by a
brand. You cannot find the average based on the orders; instead, you choose the chip flavor with the
highest orders.

Usually, you can count the most frequently occurring values and get your mean. But this only works
when the values are discrete. Now, again take the example of class marks.

Figure 9: Class marks

Over here, the value 35 occurs the most frequently and hence is the mode. But what if the values are
categorical? In that case, you must use the formula below:

Figure 10: Mode

Where,

l = lower limit of modal class

h = lower limit of preceding modal class

f1 = frequency of modal class

f0 = frequency of class preceding modal class

f2 = frequency of class succeeding modal class

The modal class is simply the class with the highest frequency. Consider the range of frequencies given
for the marks obtained by students in a class:

Marks 10-20 20-30 30-40 40-50

Number of Students 1 3 5 4

Table 1: Class Marks


In this case, you can see that class 30-40 has the highest frequency, hence it is the modal class. The
remaining values are as follows:

l = 30

h = 20

f1 = 5

f0 = 3

f2 = 4

In that case, the mode becomes :

Figure 11: Class marks mode

Hence, the mark which occurs most frequently is 43.33.

What Is Geometric Mean?

So far, you only looked at mean, median, and mode, the basic measures of central tendency. But the
mean itself is of many types. Let's look at different types of the mean.

Unlike the arithmetic mean, which adds the numbers, the geometric mean multiplies our data points to
find the rate of growth. It is used to calculate population or interest growth.

The geometric mean considers compounding values. You use the geometric mean on data that is not
independent of each other and grows over time. Using geometric mean, you can find the average
growth rate of values and find out how the data will look over time. For example, you can calculate
bacteria growth, the average return of an investment portfolio, etc. using geometric mean.

Consider n terms X_1, X_2, X_3,………… X_n. The Geometric mean is obtained by taking the nth root of
the product of each term.
Figure 12: Geometric Mean

Let’s consider the mathematics marks of a class again.

Figure 13: Class marks

In this case, the geometric mean is as shown below.

Figure 14: Geometric mean of class marks

This means that the marks of your class have an average growth of 32.201 or a 32% growth between the
lowest and highest value.

Your Data Analytics Career is Around The Corner!

Data Analyst Master’s ProgramEXPLORE PROGRAM

What Is Harmonic Mean?

The harmonic mean is used to find relationships between fractions or decimals. You calculated it by
taking the reciprocal of each data point and then finding the arithmetic mean. You then again take the
joint of the resulting arithmetic mean to get the harmonic mean.

Mean, median and mode work best with whole numbers. But sometimes, you may have fractions or
decimals in your data. When this is the case, to find the mean, you have to worry about common
divisors. But if the data is vast, it will take a long time to calculate just the common denominator. You
can cut this process short by using a harmonic mean. The harmonic mean is usually used for averaging
ratios, rates, fractions, and decimal numbers.

Consider n terms X_1, X_2, X_3,………… X_n. The harmonic mean is the reciprocal of the mean of the
reciprocal of each term.

Figure 15: Harmonic mean

Consider a class whose students have obtained the following marks out of 50 in mathematics:

Figure 16: Class marks

The resulting harmonic mean is:

Figure 17: Harmonic mean of class marks

Looking forward to a career in Data Analytics? Check out the Data Analytics Bootcamp and get certified
today.

Conclusion

In this tutorial about the Measure of Central Tendency, you got an overview of central tendency terms
like mean, median, and mode and different types of mean like harmonic and geometric mean, all with
the help of definition, formulae, and solved examples.
If you need any further clarifications or want to learn more about the measure of central tendency and
mean, median, and mode, share your queries with us by mentioning them in this page's comments
section and we will have our experts review them at the earliest. You can also understand the concept
of mean, median, and mode or other concepts by checking out this video on our youtube channel.

Are you perhaps looking to learn more about data analytical concepts and looking to build a robust
career in Data Analytics? If yes, Simplilearn’s Data Analytics Certification Program should be the program
for you to check out. This program is offered in partnership with Purdue University and in collaboration
with IBM to offer you an industry-ready curriculum delivered by world-class practitioners and trainers.
The program features live online classes and unique masterclasses from Purdue University and IBM
experts. Do take a walkthrough of the course details. It just might be the solution you are looking for.

You might also like