Descriptive Staticstics: College of Information and Computing Sciences
Descriptive Staticstics: College of Information and Computing Sciences
MODULE 2:
DESCRIPTIVE STATICSTICS
Course overview
The course introduces the students to various methods of statistical analyses as applied in various industries and
enterprises. Through the use of primary statistical techniques, the students attain a meaningful understanding of
statistical reasoning within the context of management decision-making. Topics essentially focus on statistical
description, statistical induction, and analysis of statistical relationship.
Objectives
After successful completion of this module, the student can be able to;
• Identify their learning outcomes and expectations for the course;
• Recognize their capacity to create new understandings from reflecting on the course;
• Know the capabilities of Descriptive Statistics.
Module Content:
Descriptive Statistics
o Measure of Frequency distribution
o Measure of Central Tendency
o Measure of Dispersion or Variation
o Measure of Position
Supplemental Videos
Descriptive Statistics
What are descriptive statistics?
Descriptive Statistics
It quantitatively summarizes information in a significant way so that whoever is looking at it might detect relevant
patterns instantly. Descriptive statistics are divided into measures of variability and measures of central tendency.
Measures if variability consists of standard deviation, minimum and maximum variables, skewness, kurtosis, and
variance, while measures of central tendency include the mean, median, and mode.
Descriptive statistics can be used to describe a single variable (univariate analysis) or more than one variable
(bivariate/multivariate analysis). In the case of more than one variable, descriptive statistics can help
summarize relationships between variables using tools such as scatter plots.
As it was discussed in your Module 1, under lesson 3 Statistical Research Process that there are two types of
statistical data analysis which are the descriptive and inferential statistics. Thou, we will focus on the descriptive
statistics.
In a nutshell, descriptive statistics just describes and summarizes data but do not allow us to draw conclusions about
the whole population from which we took the sample.
You are simply summarizing the data with charts, tables, and graphs.
Conversely, with inferential statistics, you are using statistics to test a hypothesis, draw conclusions and make
predictions about a whole population, based on your sample.
Example 1:
Descriptive statistics about a college involve the average math test score for incoming students. It says nothing
about why the data is so or what trends we can see and follow.
Descriptive statistics help you to simplify large amounts of data in a meaningful way. It reduces lots of data into a
summary.
Example 2:
You’ve performed a survey to 40 respondents about their favorite car color. And now you have a spreadsheet with
the results.
However, this spreadsheet is not very informative and you want to summarize the data with some graphs and charts
that can allow you to come up with some simple conclusions (e.g. 24% of people said that white is their favorite
color).
For sure, this would be much more representative and clearer than an ugly spreadsheet. And you have a plenty of
options to visualize data such as pie charts, line charts, etc.
That’s the core of descriptive statistics. Note that you are not drawing any conclusions about the full population.
The frequency distribution table refers to the data in the tabular form with two columns corresponding to
the particular data and its frequency.
What is Frequency?
Here, two students said their favourite colour is red. So, the frequency of red colour is two.
Thus, the frequency of the data tells the number of times that value appears in the given data.
In our daily life, we will get a lot of information in the form of charts, figures and graphs,
etc. There can be varied information, such as marks secured by the students, population of
different countries, temperatures of various cities, etc.
Thus, the information that is collected is called the data. Well, once the data is collected, it
should be represented in a meaningful way to be understood easily. A frequency
distribution table is one of the ways to organise the data. The frequency distribution table
summarises the complete collected data in the form of a table.
In statistics, the frequency distribution table refers to the data in the tabular form with two
columns corresponding to the particular data and its frequency.
An N.G.ON.G.O conducted a blood donation camp for 3030 people, whose blood groups are
recorded as follows:
The above data can be represented in the form of a frequency distribution table as follows:
From the above table, we can observe that all the data is arranged in two columns, which
can easily be understood.
In this article, our scope of discussion will be limited to an ungrouped and grouped frequency
distribution table only.
The general types of frequency distribution tables are grouped and ungrouped frequency
distribution tables.
Ungrouped Frequency Distribution Table
An ungrouped frequency distribution table is the representation of each data separately with its
frequency. This type of table is used for the smaller set of data. Ungrouped data is the data given
in individual points.
Example:
The marks, scored by 2020 students in a test are given below:
The above tabular form of representing the data is known as the ungrouped frequency
table, as it describes the frequency of individual data.
Example:
The marks secured by 100100 students given as follows:
The frequency table for the above data can be drawn as follows by using the class intervals
Example:
Consider a jar containing the different colours of pieces of bread as shown below:
Let us understand the concept through some frequency distribution table examples.
Q.1. There are 55 students in a classroom. The teacher asked the students
to talk about their favourite subjects. The results are listed below:
From the above table, we can see that the maximum number of students 77 likes
mathematics.
Q.2. Construct the frequency distribution table for the data on heights
in cmcm of 2020 boys using the class intervals 130–135,135–140130–135,135–140 and
so on. The heights of the boys in cm are:
Ans: The frequency distribution for the above data can be constructed as follows :
Q.3. Runs scored by Rohit Sharma in 1010 International matches are
recorded as follows:
Q.4. The frequency distribution of the weights (in kgkg) 4040 persons are
given below:
Which class interval has the highest frequency and which has the lowest
frequency?
Ans:
The frequency distribution of the weights kgkg) 4040 persons is given below:
From the above data, the frequency of the class interval 40–4540–45 is 1414 which is
the maximum among all the frequencies.
And, the class interval 50–5550–55 is 33 which is the minimum among all the
frequencies.
Therefore, the class interval 40–4540–45 has the highest frequency, and the
interval 50–5550–55 is 33.
Q.5. The marks of the 3030 students of the class in Mathematics are given
below:
The grouped frequency distribution table is given for the above-given data as
follows:
Summary
In this article, we have studied the frequency distribution table and its types. The frequency
distribution table in statistics helps to find the data in simple tabular form, which is easy to
understand. We have discussed the frequency, tally marks, which are the main features of
constructing a frequency distribution table.
This article helps us understand one of the easy ways of representing data using a frequency
distribution table. The properties and applications of the frequency distribution table help us
explore the data features easily.
Q.4. What are the differences between the frequency table and the frequency distribution table?
Ans: The frequency table is a tabular method where each part of the data is assigned to its corresponding
frequency. In comparison, a frequency distribution is generally the graphical representation of the frequency
table.
1.Central Tendency
Central tendency (also called measures of location or central location) is a method to describe what’s
typical for a group (set) of data.
It means central tendency doesn’t show us what is typical about each one piece of data, but it gives us an
overview of the whole picture of the entire data set.
It tells us what is normal or average for a given set of data. There are three key methods to show central
tendency: mean, mode, and median.
Mean
As the name suggests, mean is the average of a given set of numbers. The mean is calculated in
two very easy steps:
1. Find the whole sum as add the data together
2. Divide the sum by the total number of data
Example 3:
The girls’ heights in inches are: 62, 70, 60, 63, 66.
[/su_note]
To calculate the mean height for the group of girls you need to add the data together:
62 + 70 + 60 + 63 + 65 = 320.
Now, you take the sum (320) and divide it by the total number of girls (5): 320 / 5 = 64.
So, our mean is 64.
The best advantage of the mean is that it can be used to find both continuous and discrete
numerical data (see our post about continuous vs discrete data).
Of course, the mean has limitations. Data must be numerical in order to calculate the mean. You
cannot work with the mean when you have nominal data (see our post about nominal vs ordinal
data).
Mode
The mode of a set of data is the number in the set that occurs most often.
Let’s see the next of our descriptive statistics examples, problems and solutions.
Example 4:
Consider you have a dataset with the retirement age of 10 people, in whole years:
To illustrate this let’s see table below that shows the frequency of the retirement age data.
As you see, the most common value is 55. That is why the mode of this data set is 55 years.
The mode has one very important advantage over the median and the mean. It can be calculated for
both numerical and categorical data (see our post about categorical data examples).
Limitations of the mode: In some data sets, the mode may not reflect the centre of the set. In the
above example, if we order the retirement age from lowest to the highest, would see that the centre
of the data set is 57 years, but the mode is lower, at 53 years.
Median
Simply said, the median is the middle value in a data set. As you might guess, in order to calculate
the middle, you need:
– first listing the data in a numerical order
– second, locating the value in the middle of the list.
Example 5:
The middle number in the below set is 26 as there are 4 numbers above it and 4 numbers below:
But this was an odd set of data – you have 9 numbers. How to find the middle if you have an even
set of data?
Easily – you just need to find the average of the two middle numbers.
For example, in the below dataset of 10 numbers, the average of the numbers is 26.5 (26 + 27) / 2.
As an advantage of the median, we can say that it is less reflected by outliers and skewed data than
the mean. We usually prefer the median when the data set is not symmetrical.
And to point the limitation, we should say that as the median cannot be ordered in a logical way, it
cannot be calculated for nominal data.
Having trouble remembering the difference between the mode, mean, and median? Here are some
hints:
The word MOde is very like MOst (the most frequent number)
“Mean” requires you do some arithmetic (adding all the numbers together and dividing).
“Median” practically means “Middle” and has the same number of letters.
Having trouble deciding which measure to use when you have nominal, ordinal or interval data? The
above table can help.
2. Disper
sion or Variation
Central tendency tells us important information but it doesn’t show everything we want to know about average
values. Central tendency fails to reveal the extent to which the values of the individual items differ in a data set.
Measures of dispersion do a lot more – they complement the averages and allow us to interpret them much better.
Dispersion in statistics describes the spread of the data values in a given dataset. In other words, it shows how the
data is “dispersed” around the mean (the central value).
Example 6:
Imagine you have to compare the performance of 2 group of students on the final math exam. You find that
the average math test results are identical for both groups.
Is that mean the students in the two groups are performing equally? NO! Let’s see why.
However, in group A the individual scores are concentrated around the center – 60. All students in A have a
very similar performance. There is consistency.
On the other hand, in group B the mean is also 60 but the individual scores are not even close to the
center. One score is quite small – 40 and one score is very large – 80.
We can conclude that there is greater dispersion in group B.
Note:
The study of dispersion has a key role in statistical data. If in a given country there are very poor people and
very rich people, we say there is serious economic disparity. Dispersion also is very useful when we want to
find the relation between the set of data.
There are two popular measures of dispersion: standard deviation and range.
Let’s see some more descriptive statistics examples and definitions for dispersion measures.
The Range
The range is simply the difference between the largest and smallest value in a data set. It shows how
much variation from the average exists.
You might guess that low range tells us that the data points are very close to the mean. And a high
range shows the opposite.
Example 7:
You see that the data values in Group A are much closer to the mean than the ones in Group B.
A serious disadvantage of the Range is that it only provides information about the minimum and
maximum of the data set. It tells nothing about the values in between.
As in the Range, a low standard deviation tells us that the data points are very close to the mean. And a
high standard deviation shows the opposite.
Example 8:
Let’s find the standard deviation of the math exam scores by hand. We use simple values for the
purposes of easy calculations.
Now, let’s replace the values in the formula:
The result above shows that, on average, every math exam score in The Group of students A is
approximately 2.45 points away from the mean of 60.
Of course, you can calculate the above values by calculator instead by hand.
Note: The above formula is for a sample of a population. The standard deviation of an entire population
is represented by the Greek lowercase letter sigma and looks like that:
Conclusion:
The above 8 descriptive statistics examples, problems and solutions are simple but aim to make you
understand the descriptive data better.
As you saw, descriptive statistics are used just to describe some basic features of the data in a study.
They provide simple summaries about the sample and enable us to present data in a meaningful way. It
allows a simpler interpretation of the data.
Together with some plain graphics analysis, they form a solid basis for almost every quantitative analysis
of data.
Descriptive statistics cannot, however, be used for making conclusions beyond the data we have
analyzed or making conclusions regarding any hypotheses.
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal
groups, each representing a fourth of the population being sampled. A quartile is a type of quantile.
In epidemiology, sociology and finance, the quartiles of a population are the four subpopulations defined by
classifying individuals according to whether the value concerned falls into one of the four ranges defined by the three
values discussed above. Thus an individual item might be described as being "in the upper quartile".
Definitions
first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
second quartile (designated Q2) = median = cuts data set in half = 50th percentile
third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% = 75th percentile
The difference between the upper and lower quartiles is called the interquartile range.
Computing methods
Case 1: If L is a whole number, then the value will be found halfway between positions L and L+1.
Case 2: If L is a fraction, round to the nearest whole number. (for example, L = 1.2 becomes 1)
Examples:
Method 1
Use the median to divide the ordered data set into two halves. Do not include the median into the halves, or the
minimum and maximum.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the
upper half of the data.
This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.
Method 2
Use the median to divide the ordered data set into two halves. If the median is a datum (as opposed to being the
average of the middle two data), include the median in both halves.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the
upper half of the data.
Example 6
3, 5, 2, 7, 6, 4, 9.
Decile
Decile refers to one of ten equal groups which are divided a large group of values or statistics.
It is any one of the numbers or values in a series dividing the distribution of the individuals in the series into ten
groups of equal frequency.
The deciles are the nine values of the variable that divide an ordered data set into ten equal parts.
The deciles determine the values for 10%, 20%... and 90% of the data.
D5 coincides with the median.
The Decile function computes the specified decile of the specified random variable or data set.
The first parameter can be a data set (represented as an Array), a distribution, a random variable, or an algebraic
expression involving random variables.
3, 5, 2, 7, 6, 4, 9.
Example 3:
3, 5, 2, 7, 6, 4, 9.
Percentiles
In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall. For
example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found. The
term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced
tests.
The 25th percentile is also known as the first quartile (Q1), the 50th percentile as the median or second quartile
(Q2), and the 75th percentile as the third quartile (Q3).
There is no universally accepted definition of a percentile. Using the 65th percentile as an example, the 65th
percentile can be defined as the lowest score that is greater than 65% of the scores. This is the way we defined it
above and we will call this "Definition 1". The 65th percentile can also be defined as the smallest score that is greater
than or equal to 65% of the scores. This we will call "Definition 2". Unfortunately, these two definitions can lead to
dramatically different results, especially when there is relatively little data. Moreover, neither of these definitions is
explicit about how to handle rounding. For instance, what score is required to be higher than 65% of the scores
when the total number of scores is 50? This is tricky because 65% of 50 is 32.5. How do we find the lowest number
that is higher than 32.5 of the scores? A third way to compute percentiles (presented below), is a weighted average
of the percentiles computed according to the first two definitions. This third definition handles rounding more
gracefully than the other two and has the advantage that it allows the median (discussed later) to be defined
conveniently as the 50th percentile.
so the 40th percentile would be the third number (since 2.5 rounds up to 3), or 35.
The 100th percentile is defined to be the largest value. (In this case we do not use the above definition with P=100,
because the rank n would be greater than the number N of values in the original list.)
Linear interpolation between closest ranks
An alternative to rounding used in many applications is to use linear interpolation between the two nearest ranks.
In particular, given the N sorted values , we define the percent rank corresponding to the nth value as:
This is halfway between 20 and 35, which one would expect since the rank was calculated above as 2.5.
It is readily confirmed that the 50th percentile of any list of values according to this definition of the P-th percentile is
just the sample median.
Moreover, when N is even the 25th percentile according to this definition of the P-th percentile is the median of the
first values (i.e., the median of the lower half of the data).
Weighted percentile
In addition to the percentile function, there is also a weighted percentile, where the percentage in the total weight is
counted instead of the total number. There is no standard function for a weighted percentile. One method extends
the above approach is a natural way.
Applications
The methods given above are approximations for use in small-sample statistics. In general terms, for very large
populations percentiles may often be represented by reference to a normal curve plot. The normal curve is plotted
along an axis scaled to standard deviation, or sigma, units. Mathematically, the normal curve extends to negative
infinity on the left and positive infinity on the right. Note, however, that a very small portion of individuals in a
population will fall outside the −3 to +3 range.
In humans, for example, a small portion of all people can be expected to fall above the +3 sigma height level.
Percentiles represent the area under the normal curve, increasing from left to right. Each standard deviation
represents a fixed percentile. Thus, rounding to two decimal places, −3 is the 0.13th percentile, −2 the 2.28th
percentile, −1 the 15.87th percentile, 0 the 50th percentile (both the mean and median of the distribution), +1 the
84.13th percentile, +2 the 97.72nd percentile, and +3 the 99.87th percentile. Note that the 0th percentile falls at
negative infinity and the 100th percentile at positive infinity.
Examples:
EXAMPLE 1
Consider the 25th percentile for the 8 numbers in the table. Notice the numbers are given ranks ranging from 1 for
the lowest number to 8 for the highest number.
The first step is to compute the rank (R) of the 25th percentile. This is done using the following formula:
R=P100(N+1)
where P is the desired percentile (25 in this case) and N is the number of numbers (8 in this case). Therefore,
R=25100(8+1)=94=2.25
If R were an integer, the Pthe percentile would be the number with rank R. When R is not an integer, we compute
the Pth percentile by interpolation as follows:
Define IR as the integer portion of R (the number to the left of the decimal point). For this
example, IR=2
Find the scores with Rank IR and with Rank IR+1 For this example, this means the score with Rank 2 and the score
with Rank 3. The scores are 5 and 7.
Interpolate by multiplying the difference between the scores by FR and add the result to the lower score. For these
data, this is 0.25× (7−5) +5=5.5
Therefore, the 25th percentile is 5.5. If we had used the first definition (the smallest score greater than 25% of the
scores) the 25th percentile would have been 7. If we had used the second definition (the smallest score greater than
or equal to 25% of the scores) the 25th percentile would have been 5.
EXAMPLE 2
R=25100(20+1)=214=5.25
IR=5
FR=0.25
Since the score with a rank of IR (which is 5) and the score with a rank of IR+1 (which is 6) are both equal to 5, the
25th percentile is 5. In terms of the formula:
The 25th percentile equals
0.25×(5−5)+5=5
R=85100(20+1)=17.85
IR=17
FR=0.85
CAUTION:
The score with a rank of 17 is 9 and the score with a rank of 18 is 10. Therefore, the 85th
percentile is:
0.85×(10−9)+9=9.85
R=50100(4+1)=2.5
IR=2
FR=0.5
The score with a rank of IR is 3 and the score with a rank of IR+1 is 5. Therefore, the 50th percentile is:
0.5×(5−3)+3=4
EXAMPLE 3:
R=50100(5+1)=3
IR=3
FR=0
Whenever FR=0, you simply find the number with rank IR. In this case, the third number is equal to 5, so the 50th
percentile is 5. You will also get the right answer if you apply the general
formula:
0.00×(9−5)+5=5
Example 4:
The handle of a suitcase that fits 99% of the adult population is:
An extra of 2 cm gives also some margin for the biggest hand. That makes 12 cm.
To calculate other percentiles, you can look up the corresponding Z-value in this Z-table. In a first step you have to
search the desired percentile between all the numbers in the middle. The bold numbers at the outside give the Z-
value.
Example 5:
In the Z-table you can find 17,11 which is the closest to 17. The corresponding Z-value is than
- 0,95.
Example 6:
In the Z-table you can find in the row of 2,2 and the column of 0,08 the percentile 98,87. This means that 98,87% of
the population is smaller.
The percentile of the corresponding fist height, determines how many adults will have to bend over.
With this Z-value the percentile 36 of fist height corresponds. This means that everybody who is taller, 64%, will
wash-up at a height lower than his fist and will have to bend forward in the back.
Supplemental Video:
https://www.youtube.com/watch?v=40o82o3uNfk
https://www.youtube.com/watch?v=XyVI8IfgMts
https://www.youtube.com/watch?v=1M6KDrFAYFE
https://www.youtube.com/watch?v=kKE_VnW-npQ
https://www.youtube.com/watch?v=XyVI8IfgMts&t=333s
https://www.youtube.com/watch?v=szirqaIhCyQ&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=5
https://www.youtube.com/watch?v=Qpy85Xsw_cs&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=2
https://www.youtube.com/watch?v=kKE_VnW-npQ&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=2
Resources:
https://aidaform.com/blog/qualitative-vs-quantitative.html
https://www.thinkdataanalytics.com/decision-tree-algorithm/#How_do_Decision_Trees_work
https://test.researchprospect.com/step-by-step-guide-to-statistical-analysis/#
http://www.fao.org/3/w3241e/w3241e05.htm
https://libguides.library.curtin.edu.au/uniskills/numeracy-skills/statistics/descriptive#s-lg-box-wrapper-25241986
https://www.embibe.com/exams/frequency-distribution-table/
https://www.embibe.com/exams/frequency-distribution-table/#Applications_of_Frequency_Distribution_Table
https://www.intellspot.com/descriptive-statistics-examples/
https://owl.purdue.edu/owl/research_and_citation/using_research/writing_with_statistics/descriptive_statistics.html
http://www.dinbelg.be/formulas.htm
http://cnx.org/content/m10805/latest/
http://en.wikipedia.org/wiki/Percentile
List five applications of multimedia
http://www.wordnik.com/words/decile
http://www.math.unb.ca/~knight/BasicStat/quartilx.htm
http://www.vitutor.com/statistics/descriptive/a_15.html
http://www.vitutor.com/statistics/descriptive/deciles.html
http://www.yourdictionary.com/decile
http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile
http://www.vitutor.com/statistics/descriptive/a_15.html
http://www.vitutor.com/statistics/descriptive/deciles.html
http://www.yourdictionary.com/decile
http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile
http://www.mathsteacher.com.au/year9/ch17_statistics/06_quartiles/quartiles.htm
http://en.wikipedia.org/wiki/Quartile
https://quartilesdecilespercentiles.blogspot.com/