When you have a huge dataset
When you have a huge dataset
you just got numbers that give you a summary of important measures of that dataset. Luckily, all this is
possible with the help of a concept of statistics called Measures of Central Tendency, that includes the
very common terms of mean, median and mode.
The most fundamental terms for analyzing data through statistics are mean, median, and mode. In this
tutorial on the Measures of Central Tendency, you will look at various terms and try to understand
mean, median, and mode with definition, formulae, and solved examples.
Of mean, median and mode, let’s first look to understand various types of mean. We start with
arithmetic mean;
In statistics, Arithmetic Mean is the average of all data values which you work with. Mean is used to find
the average value around which your data values range.
Generally, when working with data, you may want to know the average data value. This will give you a
term that incorporates every data value from the dataset. This also helps produce a term that has
minimum error out of all terms in the data set. Hence, you can minimize the individual error occurring at
any data point. The mean includes every data value in its calculation and gives us a cumulative term that
sums up the dataset well.
To find the mean, all you have to do is add up all the values in your data and then divide it by the total
number of data values. Consider n terms X_1, X_2, X_3,………… X_n. The mean is the total sum of terms
by the number of terms.
Now, you will understand mean with the help of an example. Consider a class whose students have
obtained the following marks out of 50 in mathematics :
Figure 3: Class marks data
You can see that there are 12 data points. So all you have to do is add up each value and divide the
result by 12, as shown below :
Hence, you get the mean as 37. This means that, on average, a student belonging to the above class will
score 37 out of 50 in mathematics.
Next in this mean, median, and mode tutorial, we move on to understanding about median.
What Is Median?
Median refers to the middle value of your data. To find the median, you first sort the data in either
ascending or descending order and then find the numerical value present in the middle of your data.
The median refers to the middle value of your data. You can use the median to figure out the point
around which your data is centered. It divides the data into two halves and has the same number of data
points above and below.
The median is especially useful when you have skewed data. That is, it has high data distribution
towards one side. In this case, the average wouldn't give you a fair mid-value but would lean more
towards the higher values. In this case, you can use the middle data point as the central point instead.
Consider n terms X_1, X_2, X_3,………… X_n. The basic formula for the median is by dividing the total
number of observations by 2. This works fine when you have an odd number of terms because you will
have one middle term and the same number of terms above and below. For an even number of terms,
consider the two middle terms and find their average.
Now, use the same example of a class of 12 students and their marks in mathematics and find the
median of this data.
To find the middle term, you first have to sort the data or arrange the data in ascending or descending
order. This ensures that consecutive terms are next to each other.
You can see that we have 12 data points, so use the median formula for even numbers.
So, the middle term in the range of marks is 37. This means that the other marks lie in a frequency range
of around 37.
We now come to the last of the mean, median, and mode trio - mode.
What Is Mode?
The Mode refers to the most frequently occurring value in your data. You find the frequency of
occurrence of each number and the number with the highest frequency is your mode. If there are no
recurring numbers, then there is no mode in the data.
Using the mode, you can find the most commonly occurring point in your data. This is helpful when you
have to find the central tendency of categorical values, like the flavor of the most popular chip sold by a
brand. You cannot find the average based on the orders; instead, you choose the chip flavor with the
highest orders.
Usually, you can count the most frequently occurring values and get your mean. But this only works
when the values are discrete. Now, again take the example of class marks.
Over here, the value 35 occurs the most frequently and hence is the mode. But what if the values are
categorical? In that case, you must use the formula below:
Where,
The modal class is simply the class with the highest frequency. Consider the range of frequencies given
for the marks obtained by students in a class:
Number of Students 1 3 5 4
l = 30
h = 20
f1 = 5
f0 = 3
f2 = 4
So far, you only looked at mean, median, and mode, the basic measures of central tendency. But the
mean itself is of many types. Let's look at different types of the mean.
Unlike the arithmetic mean, which adds the numbers, the geometric mean multiplies our data points to
find the rate of growth. It is used to calculate population or interest growth.
The geometric mean considers compounding values. You use the geometric mean on data that is not
independent of each other and grows over time. Using geometric mean, you can find the average
growth rate of values and find out how the data will look over time. For example, you can calculate
bacteria growth, the average return of an investment portfolio, etc. using geometric mean.
Consider n terms X_1, X_2, X_3,………… X_n. The Geometric mean is obtained by taking the nth root of
the product of each term.
Figure 12: Geometric Mean
This means that the marks of your class have an average growth of 32.201 or a 32% growth between the
lowest and highest value.
The harmonic mean is used to find relationships between fractions or decimals. You calculated it by
taking the reciprocal of each data point and then finding the arithmetic mean. You then again take the
joint of the resulting arithmetic mean to get the harmonic mean.
Mean, median and mode work best with whole numbers. But sometimes, you may have fractions or
decimals in your data. When this is the case, to find the mean, you have to worry about common
divisors. But if the data is vast, it will take a long time to calculate just the common denominator. You
can cut this process short by using a harmonic mean. The harmonic mean is usually used for averaging
ratios, rates, fractions, and decimal numbers.
Consider n terms X_1, X_2, X_3,………… X_n. The harmonic mean is the reciprocal of the mean of the
reciprocal of each term.
Consider a class whose students have obtained the following marks out of 50 in mathematics:
Looking forward to a career in Data Analytics? Check out the Data Analytics Bootcamp and get certified
today.
Conclusion
In this tutorial about the Measure of Central Tendency, you got an overview of central tendency terms
like mean, median, and mode and different types of mean like harmonic and geometric mean, all with
the help of definition, formulae, and solved examples.
If you need any further clarifications or want to learn more about the measure of central tendency and
mean, median, and mode, share your queries with us by mentioning them in this page's comments
section and we will have our experts review them at the earliest. You can also understand the concept
of mean, median, and mode or other concepts by checking out this video on our youtube channel.
Are you perhaps looking to learn more about data analytical concepts and looking to build a robust
career in Data Analytics? If yes, Simplilearn’s Data Analytics Certification Program should be the program
for you to check out. This program is offered in partnership with Purdue University and in collaboration
with IBM to offer you an industry-ready curriculum delivered by world-class practitioners and trainers.
The program features live online classes and unique masterclasses from Purdue University and IBM
experts. Do take a walkthrough of the course details. It just might be the solution you are looking for.