0% found this document useful (0 votes)
86 views

Descriptive Staticstics: College of Information and Computing Sciences

This document provides an overview of a module on descriptive statistics for students. It outlines the course objectives, which are for students to understand descriptive statistics capabilities and identify their own learning outcomes. The module content covers measures of frequency distribution, central tendency, dispersion, and position. Descriptive statistics are then defined as quantitatively summarizing data in a meaningful way to detect patterns. Examples are provided of how descriptive statistics can describe a single variable or relationship between multiple variables through tools like scatter plots. Frequency distribution tables are also introduced as a way to organize data into a table with two columns for values and their frequencies.

Uploaded by

Art Ijb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Descriptive Staticstics: College of Information and Computing Sciences

This document provides an overview of a module on descriptive statistics for students. It outlines the course objectives, which are for students to understand descriptive statistics capabilities and identify their own learning outcomes. The module content covers measures of frequency distribution, central tendency, dispersion, and position. Descriptive statistics are then defined as quantitatively summarizing data in a meaningful way to detect patterns. Examples are provided of how descriptive statistics can describe a single variable or relationship between multiple variables through tools like scatter plots. Frequency distribution tables are also introduced as a way to organize data into a table with two columns for values and their frequencies.

Uploaded by

Art Ijb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

COLLEGE OF INFORMATION AND COMPUTING SCIENCES

Instructional Material for students

MODULE 2:
DESCRIPTIVE STATICSTICS
Course overview

The course introduces the students to various methods of statistical analyses as applied in various industries and
enterprises. Through the use of primary statistical techniques, the students attain a meaningful understanding of
statistical reasoning within the context of management decision-making. Topics essentially focus on statistical
description, statistical induction, and analysis of statistical relationship.

Objectives

After successful completion of this module, the student can be able to;
• Identify their learning outcomes and expectations for the course;
• Recognize their capacity to create new understandings from reflecting on the course;
• Know the capabilities of Descriptive Statistics.

Module Content:
 Descriptive Statistics
o Measure of Frequency distribution
o Measure of Central Tendency
o Measure of Dispersion or Variation
o Measure of Position
 Supplemental Videos

Descriptive Statistics
What are descriptive statistics?

Descriptive Statistics
It quantitatively summarizes information in a significant way so that whoever is looking at it might detect relevant
patterns instantly. Descriptive statistics are divided into measures of variability and measures of central tendency.
Measures if variability consists of standard deviation, minimum and maximum variables, skewness, kurtosis, and
variance, while measures of central tendency include the mean, median, and mode.
Descriptive statistics can be used to describe a single variable (univariate analysis) or more than one variable
(bivariate/multivariate analysis). In the case of more than one variable, descriptive statistics can help
summarize relationships between variables using tools such as scatter plots.

As it was discussed in your Module 1, under lesson 3 Statistical Research Process that there are two types of
statistical data analysis which are the descriptive and inferential statistics. Thou, we will focus on the descriptive
statistics.

In a nutshell, descriptive statistics just describes and summarizes data but do not allow us to draw conclusions about
the whole population from which we took the sample.

You are simply summarizing the data with charts, tables, and graphs.

Conversely, with inferential statistics, you are using statistics to test a hypothesis, draw conclusions and make
predictions about a whole population, based on your sample.

Let’s see the first of our descriptive statistics examples.

Example 1:

Descriptive statistics about a college involve the average math test score for incoming students. It says nothing
about why the data is so or what trends we can see and follow.

Descriptive statistics help you to simplify large amounts of data in a meaningful way. It reduces lots of data into a
summary.

Example 2:

You’ve performed a survey to 40 respondents about their favorite car color. And now you have a spreadsheet with
the results.

However, this spreadsheet is not very informative and you want to summarize the data with some graphs and charts
that can allow you to come up with some simple conclusions (e.g. 24% of people said that white is their favorite
color).

For sure, this would be much more representative and clearer than an ugly spreadsheet. And you have a plenty of
options to visualize data such as pie charts, line charts, etc.

That’s the core of descriptive statistics. Note that you are not drawing any conclusions about the full population.

Frequency Distribution Table: Definition, Types, Examples


Frequency Distribution Table  in statistics provides the information of the number of occurrences
(frequency) of different values distributed within a given time or over a given interval in a list, or table
or graphical representation.

The frequency distribution table refers to the data in the tabular form with two columns corresponding to
the particular data and its frequency.
What is Frequency?

The frequency of any value of the data is the number of times


that value occurs in the given data set. In general, frequency
is something how often it occurs in the data.

For example, if we ask the favourite colours of five people,


they said their favourite colours are blue, red, white, black,
and red.

Here, two students said their favourite colour is red. So, the frequency of red colour is two.

Thus, the frequency of the data tells the number of times that value appears in the given data.

Frequency Distribution of Table Definition

In our daily life, we will get a lot of information in the form of charts, figures and graphs,
etc. There can be varied information, such as marks secured by the students, population of
different countries, temperatures of various cities, etc.

Thus, the information that is collected is called the data. Well, once the data is collected, it
should be represented in a meaningful way to be understood easily. A frequency
distribution table is one of the ways to organise the data. The frequency distribution table
summarises the complete collected data in the form of a table.

In statistics, the frequency distribution table refers to the data in the tabular form with two
columns corresponding to the particular data and its frequency.

Frequency Distribution Table Example :

An N.G.ON.G.O conducted a blood donation camp for  3030 people, whose blood groups are
recorded as follows:

The above data can be represented in the form of a frequency distribution table as follows:

From the above table, we can observe that all the data is arranged in two columns, which
can easily be understood.

Types of Frequency Distribution Table


The frequency distribution table gives the information of the collected data in well designed
tabular form to analyse the data quickly. There are different types of frequency distribution tables
according to the representation of data. They are

1. Ungrouped frequency distribution table


2. Grouped frequency distribution table
3. Cumulative frequency distribution table
4. Relative frequency distribution table
5. Relative cumulative frequency distribution table

In this article, our scope of discussion will be limited to an ungrouped and grouped frequency
distribution table only.

The general types of frequency distribution tables are grouped and ungrouped frequency
distribution tables.
Ungrouped Frequency Distribution Table
An ungrouped frequency distribution table is the representation of each data separately with its
frequency. This type of table is used for the smaller set of data. Ungrouped data is the data given
in individual points.

Example:
The marks, scored by  2020 students in a test are given below:

The tabular form of the above data can be given as follows:

The above tabular form of representing the data is known as the ungrouped frequency
table, as it describes the frequency of individual data.

Grouped Frequency Distribution Table


Let us consider marks secured by  100100 students at the school. It is tough to construct the
frequency distribution table for each data, i.e., each student’s scores at the school. In this case,
the table becomes lengthy and very difficult.
To overcome this problem, we will make the data into some groups known as class intervals.

Example:
The marks secured by  100100 students given as follows:
The frequency table for the above data can be drawn as follows by using the class intervals

How to Construct Frequency Distribution Table


The frequency distribution table is constructed by using the tally marks. Tally marks are a
form of a numerical system with the vertical lines used for counting. The cross line is
placed over the four lines to get a total  55.

Example:
Consider a jar containing the different colours of pieces of bread as shown below:

Construct a frequency distribution table for the data mentioned above.


Answer:

Applications of Frequency Distribution Table


The two types of frequency distribution tables are the ungrouped frequency distribution
table and grouped frequency distribution table. Some of the observations made from the
frequency distribution table method as follows:
1. The table used to measure the frequency of the data
From the table, we can observe the number of times the data appears in the data
using frequency.
2. The table helps measures the dispersion, i.e. range, variance, and standard
deviation.
The range is the difference between higher and lower values of the given data.
3. Measures of central tendency and location, i.e. mean, median, and mode.
4. The table helps to determine the extent of the symmetry or asymmetry.

Let us understand the concept through some frequency distribution table examples.

Frequency Distribution Calculator


The frequency distribution calculator constructs a frequency distribution table, providing a
snapshot view of the characteristics of a dataset. The calculator spits out some other data
descriptors, such as mean, median, and skewness.

Solved Examples – Frequency Distribution Table

Q.1. There are  55  students in a classroom. The teacher asked the students
to talk about their favourite subjects. The results are listed below:

By looking at the above data, which is the most liked subject?


Ans:
Representing the above data in the frequency distribution table by using tally
marks as follows

From the above table, we can see that the maximum number of students  77 likes
mathematics.

Q.2.   Construct the frequency distribution table for the data on heights
in  cmcm of 2020  boys using the class intervals  130–135,135–140130–135,135–140  and
so on. The heights of the boys in cm are:

Ans:   The frequency distribution for the above data can be constructed as follows :
Q.3. Runs scored by Rohit Sharma in  1010  International matches are
recorded as follows:

Construct an ungrouped frequency distribution table for the given data.


Ans:

Q.4.   The frequency distribution of the weights (in  kgkg) 4040  persons are
given below:

Which class interval has the highest frequency and which has the lowest
frequency?
Ans:
The frequency distribution of the weights  kgkg) 4040 persons is given below:

From the above data, the frequency of the class interval  40–4540–45 is 1414 which is
the maximum among all the frequencies.
And, the class interval  50–5550–55 is 33 which is the minimum among all the
frequencies.
Therefore, the class interval  40–4540–45 has the highest frequency, and the
interval 50–5550–55 is 33.

Q.5.   The marks of the  3030  students of the class in Mathematics are given
below:  

Construct a grouped frequency distribution table by taking suitable


intervals.
Ans:

The grouped frequency distribution table is given for the above-given data as
follows:
Summary
In this article, we have studied the frequency distribution table and its types. The frequency
distribution table in statistics helps to find the data in simple tabular form, which is easy to
understand. We have discussed the frequency, tally marks, which are the main features of
constructing a frequency distribution table.
This article helps us understand one of the easy ways of representing data using a frequency
distribution table. The properties and applications of the frequency distribution table help us
explore the data features easily.

Frequently Asked Questions (FAQ)- Frequency Distribution Table


Q.1. How do you calculate frequency distribution?
Ans: To make the frequency distribution table,
1. Write the categories in the first column.
2. Tally, the score of the category of the given data in the second column
3. Count the tally and write the frequency of each category in the third column

Q.2. What are the 3 types of frequency distribution?


Ans: The 3 types are
1. Ungrouped frequency distribution
2. Grouped frequency distribution
3. Cumulative frequency distribution

Q.3. What is a frequency distribution?


Ans: In statistics, the frequency distribution represents the data in a tabular or graphical manner, which
shows the frequency of all the given data.

Q.4. What are the differences between the frequency table and the frequency distribution table?
Ans: The frequency table is a tabular method where each part of the data is assigned to its corresponding
frequency. In comparison, a frequency distribution is generally the graphical representation of the frequency
table.

Q.5. How do you describe a frequency distribution table?


Ans: In statistics, frequency distribution tables are one of the best ways to represent the data. The frequency
distribution table is used to analyse the data of a larger set.

Descriptive statistics has 2 main types:


 Measures of Central Tendency (Mean, Median, and Mode).
 Measures of Dispersion or Variation (Variance, Standard Deviation, Range).

1.Central Tendency
Central tendency (also called measures of location or central location) is a method to describe what’s
typical for a group (set) of data.
It means central tendency doesn’t show us what is typical about each one piece of data, but it gives us an
overview of the whole picture of the entire data set.

It tells us what is normal or average for a given set of data. There are three key methods to show central
tendency: mean, mode, and median.

 Mean

As the name suggests, mean is the average of a given set of numbers. The mean is calculated in
two very easy steps:
1. Find the whole sum as add the data together
2. Divide the sum by the total number of data

The below is one of the most common descriptive statistics examples.

Example 3: 

Let’s say you have a sample of 5 girls and 6 boys.


[su_note note_color=”#d8ebd6″]

The girls’ heights in inches are: 62, 70, 60, 63, 66.
[/su_note]

To calculate the mean height for the group of girls you need to add the data together:
62 + 70 + 60 + 63 + 65 = 320.
Now, you take the sum (320) and divide it by the total number of girls (5): 320 / 5 = 64.
So, our mean is 64.

The best advantage of the mean is that it can be used to find both continuous and discrete
numerical data (see our post about continuous vs discrete data).
Of course, the mean has limitations. Data must be numerical in order to calculate the mean. You
cannot work with the mean when you have nominal data (see our post about nominal vs ordinal
data).

 Mode

The mode of a set of data is the number in the set that occurs most often.
Let’s see the next of our descriptive statistics examples, problems and solutions.

Example 4:
Consider you have a dataset with the retirement age of 10 people, in whole years:

55, 55, 55, 56, 56, 57, 58, 58, 59, 60

To illustrate this let’s see table below that shows the frequency of the retirement age data.

As you see, the most common value is 55. That is why the mode of this data set is 55 years.

The mode has one very important advantage over the median and the mean. It can be calculated for
both numerical and categorical data (see our post about categorical data examples).

Limitations of the mode: In some data sets, the mode may not reflect the centre of the set. In the
above example, if we order the retirement age from lowest to the highest, would see that the centre
of the data set is 57 years, but the mode is lower, at 53 years.

 Median

Simply said, the median is the middle value in a data set. As you might guess, in order to calculate
the middle, you need:
–  first listing the data in a numerical order
– second, locating the value in the middle of the list.
Example 5:
The middle number in the below set is 26 as there are 4 numbers above it and 4 numbers below:

21, 22, 24, 24, 26, 27, 28, 29, 31.

But this was an odd set of data – you have 9 numbers. How to find the middle if you have an even
set of data?

Easily – you just need to find the average of the two middle numbers.

For example, in the below dataset of 10 numbers, the average of the numbers is 26.5 (26 + 27) / 2.

21, 22, 24, 24, 26, 27, 28, 29, 31, 32

As an advantage of the median, we can say that it is less reflected by outliers and skewed data than
the mean. We usually prefer the median when the data set is not symmetrical.

And to point the limitation, we should say that as the median cannot be ordered in a logical way, it
cannot be calculated for nominal data.

Having trouble remembering the difference between the mode, mean, and median? Here are  some
hints:
 The word MOde is very like MOst (the most frequent number)
 “Mean” requires you do some arithmetic (adding all the numbers together and dividing).
 “Median” practically means “Middle” and has the same number of letters.

Having trouble deciding which measure to use when you have nominal, ordinal or interval data? The
above table can help.

2. Disper
sion or Variation
Central tendency tells us important information but it doesn’t show everything we want to know about average
values. Central tendency fails to reveal the extent to which the values of the individual items differ in a data set.

Measures of dispersion do a lot more – they complement the averages and allow us to interpret them much better.
Dispersion in statistics describes the spread of the data values in a given dataset.  In other words, it shows how the
data is “dispersed” around the mean (the central value).

Example 6:

Imagine you have to compare the performance of 2 group of students on the final math exam. You find that
the average math test results are identical for both groups.

Is that mean the students in the two groups are performing equally? NO! Let’s see why.

Group of students A: 56, 58, 60, 62, 64


Group of students B: 40, 50, 60, 70, 80

Both of these groups have mean scores of 60.

However, in group A the individual scores are concentrated around the center – 60. All students in A have a
very similar performance. There is consistency.

On the other hand, in group B the mean is also 60 but the individual scores are not even close to the
center. One score is quite small – 40 and one score is very large – 80.
We can conclude that there is greater dispersion in group B.
Note:
The study of dispersion has a key role in statistical data. If in a given country there are very poor people and
very rich people, we say there is serious economic disparity. Dispersion also is very useful when we want to
find the relation between the set of data.

There are two popular measures of dispersion: standard deviation and range.
Let’s see some more descriptive statistics examples and definitions for dispersion measures.

 The Range

The range is simply the difference between the largest and smallest value in a data set. It shows how
much variation from the average exists.

You might guess that low range tells us that the data points are very close to the mean. And a high
range shows the opposite.

Here is the formula for calculating the range:


Range = max. value – min. value

Let’s see the next of our descriptive statistics examples.

Example 7:

If we use the math results from Example 6:


Group of students A: 56, 58, 60, 62, 64
Group of students B: 40, 50, 60, 70, 80

we easily can calculate the range:


Group A: 64 – 56 = 8
Group B: 80 – 40 = 40

You see that the data values in Group A are much closer to the mean than the ones in Group B.

A serious disadvantage of the Range is that it only provides information about the minimum and
maximum of the data set. It tells nothing about the values in between.

 The Standard Deviation


Standard deviation also provides information on how much variation from the mean exists.
However, the standard deviation goes further than Range and shows how each value in a dataset
varies from the mean.

As in the Range, a low standard deviation tells us that the data points are very close to the mean. And a
high standard deviation shows the opposite.

The standard deviation formula for a sample of a population is:

Example 8:

If we use the math results in Example 6:


Group of students A: 56, 58, 60, 62, 64

The mean is 60.

Let’s find the standard deviation of the math exam scores by hand. We use simple values for the
purposes of easy calculations.
 
Now, let’s replace the values in the formula:
 

The result above shows that, on average, every math exam score in The Group of students A is
approximately 2.45 points away from the mean of 60.

Of course, you can calculate the above values by calculator instead by hand.

Note: The above formula is for a sample of a population. The standard deviation of an entire population
is represented by the Greek lowercase letter sigma and looks like that:

More examples of Standard Deviation, you can see in the Explorable site.

Conclusion:

The above 8 descriptive statistics examples, problems and solutions are simple but aim to make you
understand the descriptive data better.

As you saw, descriptive statistics are used just to describe some basic features of the data in a study.

They provide simple summaries about the sample and enable us to present data in a meaningful way. It
allows a simpler interpretation of the data.

Together with some plain graphics analysis, they form a solid basis for almost every quantitative analysis
of data.

Descriptive statistics cannot, however, be used for making conclusions beyond the data we have
analyzed or making conclusions regarding any hypotheses.

Percentiles, Quartiles and Deciles


Quartile

In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal
groups, each representing a fourth of the population being sampled. A quartile is a type of quantile.
In epidemiology, sociology and finance, the quartiles of a population are the four subpopulations defined by
classifying individuals according to whether the value concerned falls into one of the four ranges defined by the three
values discussed above. Thus an individual item might be described as being "in the upper quartile".

Definitions

first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
second quartile (designated Q2) = median = cuts data set in half = 50th percentile
third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% = 75th percentile
The difference between the upper and lower quartiles is called the interquartile range.

If a data set of values is arranged in ascending order of magnitude, then:


The interquartile range is a more useful measure of spread than the range as it describes the middle 50% of the
data values.

Computing methods

There is no universal agreement on choosing the quartile values.[1]


One standard formula for locating the position of the observation at a given percentile, y, with n data points sorted in
ascending order is:[2]

Case 1: If L is a whole number, then the value will be found halfway between positions L and L+1.
Case 2: If L is a fraction, round to the nearest whole number. (for example, L = 1.2 becomes 1)

Examples:

Method 1
Use the median to divide the ordered data set into two halves. Do not include the median into the halves, or the
minimum and maximum.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the
upper half of the data.
This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.

Method 2
Use the median to divide the ordered data set into two halves. If the median is a datum (as opposed to being the
average of the middle two data), include the median in both halves.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the
upper half of the data.
Example 6

Given the series:


3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.

3, 5, 2, 7, 6, 4, 9.
Decile

Decile refers to one of ten equal groups which are divided a large group of values or statistics.

It is any one of the numbers or values in a series dividing the distribution of the individuals in the series into ten
groups of equal frequency.
The deciles are the nine values of the variable that divide an ordered data set into ten equal parts.
The deciles determine the values for 10%, 20%... and 90% of the data.
D5 coincides with the median.
The Decile function computes the specified decile of the specified random variable or data set.
The first parameter can be a data set (represented as an Array), a distribution, a random variable, or an algebraic
expression involving random variables.

The second parameter d is a decile or list of deciles.


Given the series:
3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.

3, 5, 2, 7, 6, 4, 9.
Example 3:

Given the series:


3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and
standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.

3, 5, 2, 7, 6, 4, 9.
Percentiles
In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall. For
example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found. The
term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced
tests.

The 25th percentile is also known as the first quartile (Q1), the 50th percentile as the median or second quartile
(Q2), and the 75th percentile as the third quartile (Q3).

There is no universally accepted definition of a percentile. Using the 65th percentile as an example, the 65th
percentile can be defined as the lowest score that is greater than 65% of the scores. This is the way we defined it
above and we will call this "Definition 1". The 65th percentile can also be defined as the smallest score that is greater
than or equal to 65% of the scores. This we will call "Definition 2". Unfortunately, these two definitions can lead to
dramatically different results, especially when there is relatively little data. Moreover, neither of these definitions is
explicit about how to handle rounding. For instance, what score is required to be higher than 65% of the scores
when the total number of scores is 50? This is tricky because 65% of 50 is 32.5. How do we find the lowest number
that is higher than 32.5 of the scores? A third way to compute percentiles (presented below), is a weighted average
of the percentiles computed according to the first two definitions. This third definition handles rounding more
gracefully than the other two and has the advantage that it allows the median (discussed later) to be defined
conveniently as the 50th percentile.

so the 40th percentile would be the third number (since 2.5 rounds up to 3), or 35.
The 100th percentile is defined to be the largest value. (In this case we do not use the above definition with P=100,
because the rank n would be greater than the number N of values in the original list.)
Linear interpolation between closest ranks

An alternative to rounding used in many applications is to use linear interpolation between the two nearest ranks.
In particular, given the N sorted values , we define the percent rank corresponding to the nth value as:

This is halfway between 20 and 35, which one would expect since the rank was calculated above as 2.5.
It is readily confirmed that the 50th percentile of any list of values according to this definition of the P-th percentile is
just the sample median.
Moreover, when N is even the 25th percentile according to this definition of the P-th percentile is the median of the
first values (i.e., the median of the lower half of the data).

Weighted percentile

In addition to the percentile function, there is also a weighted percentile, where the percentage in the total weight is
counted instead of the total number. There is no standard function for a weighted percentile. One method extends
the above approach is a natural way.

Applications

When ISPs bill


"burstable" internet
bandwidth, the 95th or
98th percentile usually
cuts off the top 5% or
2% of bandwidth
peaks in each month,
and then bills at the
nearest rate. In this
way infrequent peaks are
ignored, and the
customer is charged in a
fairer way. The reason
this statistic is so
useful in measuring
data throughput is
that it gives a very
accurate picture of the
cost of the bandwidth.
The 95th percentile
says that 95% of the
time, the usage is
below this amount.
Just the same, the
remaining 5% of the
time, the usage is
above that amount.
Physicians will often use infant and children's weight and height percentile to assess their growth in comparison to
national averages.

The normal curve and percentiles

The methods given above are approximations for use in small-sample statistics. In general terms, for very large
populations percentiles may often be represented by reference to a normal curve plot. The normal curve is plotted
along an axis scaled to standard deviation, or sigma, units. Mathematically, the normal curve extends to negative
infinity on the left and positive infinity on the right. Note, however, that a very small portion of individuals in a
population will fall outside the −3 to +3 range.

In humans, for example, a small portion of all people can be expected to fall above the +3 sigma height level.

Percentiles represent the area under the normal curve, increasing from left to right. Each standard deviation
represents a fixed percentile. Thus, rounding to two decimal places, −3 is the 0.13th percentile, −2 the 2.28th
percentile, −1 the 15.87th percentile, 0 the 50th percentile (both the mean and median of the distribution), +1 the
84.13th percentile, +2 the 97.72nd percentile, and +3 the 99.87th percentile. Note that the 0th percentile falls at
negative infinity and the 100th percentile at positive infinity.

Examples:

EXAMPLE 1

Consider the 25th percentile for the 8 numbers in the table. Notice the numbers are given ranks ranging from 1 for
the lowest number to 8 for the highest number.

The first step is to compute the rank (R) of the 25th percentile. This is done using the following formula:

R=P100(N+1)

where P is the desired percentile (25 in this case) and N is the number of numbers (8 in this case). Therefore,

R=25100(8+1)=94=2.25

If R were an integer, the Pthe percentile would be the number with rank R. When R is not an integer, we compute
the Pth percentile by interpolation as follows:

Define IR as the integer portion of R (the number to the left of the decimal point). For this
example, IR=2

Define FR as the fractional portion of R. For this example, FR=0.25

Find the scores with Rank IR and with Rank IR+1 For this example, this means the score with Rank 2 and the score
with Rank 3. The scores are 5 and 7.

Interpolate by multiplying the difference between the scores by FR and add the result to the lower score. For these
data, this is 0.25× (7−5) +5=5.5

Therefore, the 25th percentile is 5.5. If we had used the first definition (the smallest score greater than 25% of the
scores) the 25th percentile would have been 7. If we had used the second definition (the smallest score greater than
or equal to 25% of the scores) the 25th percentile would have been 5.

EXAMPLE 2

For a second example, consider the 20 quiz scores in the table.


We will compute the 25th and the 85th percentiles. For the 25th,

R=25100(20+1)=214=5.25

IR=5

FR=0.25

Since the score with a rank of IR (which is 5) and the score with a rank of IR+1 (which is 6) are both equal to 5, the
25th percentile is 5. In terms of the formula:
The 25th percentile equals

0.25×(5−5)+5=5

For the 85th percentile,

R=85100(20+1)=17.85

IR=17

FR=0.85

CAUTION:

FR does not generally equal the percentile to be computed as it does here.

The score with a rank of 17 is 9 and the score with a rank of 18 is 10. Therefore, the 85th
percentile is:

0.85×(10−9)+9=9.85

Let's consider the 50th percentile of the numbers 2, 3, 5, 9.

R=50100(4+1)=2.5

IR=2

FR=0.5

The score with a rank of IR is 3 and the score with a rank of IR+1 is 5. Therefore, the 50th percentile is:

0.5×(5−3)+3=4

EXAMPLE 3:

Finally, consider the 50th percentile of the numbers 2, 3, 5, 9, 11.

R=50100(5+1)=3

IR=3

FR=0

Whenever FR=0, you simply find the number with rank IR. In this case, the third number is equal to 5, so the 50th
percentile is 5. You will also get the right answer if you apply the general
formula:

The 50th percentile equals

0.00×(9−5)+5=5

Example 4:

The handle of a suitcase that fits 99% of the adult population is:

P99 hand breadth = 83 + 2,33 * 6,9 = 99 mm

An extra of 2 cm gives also some margin for the biggest hand. That makes 12 cm.

To calculate other percentiles, you can look up the corresponding Z-value in this Z-table. In a first step you have to
search the desired percentile between all the numbers in the middle. The bold numbers at the outside give the Z-
value.

Example 5:

Percentile 17 hip breadth.

In the Z-table you can find 17,11 which is the closest to 17. The corresponding Z-value is than
- 0,95.

P17 = 387 – 0,95 * 35 = 354 mm

Example 6:

A man with a body length of 1m92 results in the following Z-value:

Z = (1920 – 1706) / 94 = + 2,28

In the Z-table you can find in the row of 2,2 and the column of 0,08 the percentile 98,87. This means that 98,87% of
the population is smaller.

In a kitchen of 90 cm high the lowest point of the wash-up bowl is 75 cm high.

The percentile of the corresponding fist height, determines how many adults will have to bend over.

Z = (750 – 766) / 43 = - 0,37

With this Z-value the percentile 36 of fist height corresponds. This means that everybody who is taller, 64%, will
wash-up at a height lower than his fist and will have to bend forward in the back.

Supplemental Video:
 https://www.youtube.com/watch?v=40o82o3uNfk
 https://www.youtube.com/watch?v=XyVI8IfgMts
 https://www.youtube.com/watch?v=1M6KDrFAYFE
 https://www.youtube.com/watch?v=kKE_VnW-npQ
 https://www.youtube.com/watch?v=XyVI8IfgMts&t=333s
 https://www.youtube.com/watch?v=szirqaIhCyQ&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=5
 https://www.youtube.com/watch?v=Qpy85Xsw_cs&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=2
 https://www.youtube.com/watch?v=kKE_VnW-npQ&list=RDCMUCYc2dDPuAzbySHNj-CzLckQ&index=2

Resources:
 https://aidaform.com/blog/qualitative-vs-quantitative.html
 https://www.thinkdataanalytics.com/decision-tree-algorithm/#How_do_Decision_Trees_work
 https://test.researchprospect.com/step-by-step-guide-to-statistical-analysis/#
 http://www.fao.org/3/w3241e/w3241e05.htm
 https://libguides.library.curtin.edu.au/uniskills/numeracy-skills/statistics/descriptive#s-lg-box-wrapper-25241986
 https://www.embibe.com/exams/frequency-distribution-table/
 https://www.embibe.com/exams/frequency-distribution-table/#Applications_of_Frequency_Distribution_Table
 https://www.intellspot.com/descriptive-statistics-examples/
 https://owl.purdue.edu/owl/research_and_citation/using_research/writing_with_statistics/descriptive_statistics.html
 http://www.dinbelg.be/formulas.htm
 http://cnx.org/content/m10805/latest/
 http://en.wikipedia.org/wiki/Percentile
 List five applications of multimedia
 http://www.wordnik.com/words/decile
 http://www.math.unb.ca/~knight/BasicStat/quartilx.htm
 http://www.vitutor.com/statistics/descriptive/a_15.html
 http://www.vitutor.com/statistics/descriptive/deciles.html
 http://www.yourdictionary.com/decile
 http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile
 http://www.vitutor.com/statistics/descriptive/a_15.html
 http://www.vitutor.com/statistics/descriptive/deciles.html
 http://www.yourdictionary.com/decile
 http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile
 http://www.mathsteacher.com.au/year9/ch17_statistics/06_quartiles/quartiles.htm
 http://en.wikipedia.org/wiki/Quartile
 https://quartilesdecilespercentiles.blogspot.com/

You might also like