FDA-Unit II-Notes
FDA-Unit II-Notes
Descriptive Statistics
What is Statistics Used for?
• Statistics is used in all kinds of science and business applications.
• Statistics gives us more accurate knowledge which helps us make better decisions.
• Statistics can focus on making predictions about what will happen in the future. It can also
focus on explaining how different things are connected.
Statistics is mainly divided into 2 parts:
1. Descriptive
2. Inferential
• Here,
• xi = ith observation, 1 ≤ i ≤ n
• ∑xi = Sum of observations
• n = Number of observations
Examples
Question 1: Find the mean of the following data set.
10, 20, 36, 12, 35, 40, 36, 30, 36, 40
Solution:
• Given,
• xi = 10, 20, 36, 12, 35, 40, 36, 30, 36, 40
• n = 10
• Mean = ∑xi/n
• = (10 + 20 + 36 + 12 + 35 + 40 + 36 + 30 + 36 + 40)/10
• = 295/10
• = 29.5
• Therefore, the mean of the given data set is 29.5.
Example: If the heights of 5 people are 142 cm, 150 cm, 149 cm, 156
cm, and 153 cm.Find the mean height.
Solution:
• Mean height, x̄ = (142 + 150 + 149 + 156 + 153)/5
= 750/5
= 150
• Mean, x̄ = 150 cm
• Thus, the mean height is 150 cm.
When a data set is large, a frequency distribution table is
often used to display the data in an organized way. A
frequency distribution table lists the data values, as well
as the number of times each value appears in the data
set. In a discrete frequency distribution the arithmetic
mean may be computed by any one of the following
methods:
• Direct Method
• Assumed Mean Method
• Step-deviation Method
Mean Formula For Grouped Data
• There are three methods to find the mean for grouped
data, depending on the size of the data. They are:
• Direct Method
• Assumed Mean Method
• Step-deviation Method
Let us go through the formulas in these three methods
given below:
Direct Method
• Suppose x1, x2, x3,…., xn be n observations with respective frequencies f1, f2, f3,
…., fn. This means, the observation x1 occurs f1 times, x2 occurs f2 times,
x3 occurs f3 times and so on. Hence, the formula to calculate the mean in the
direct method is:
Here,
• ∑fixi = Sum of all the observations
• ∑fi = Sum of frequencies or observations
• This method is used when the number of observations is
small.
Mean, x̄ = (∑xi fi)/(∑fi)
= 360/40
=9
• Thus, Mean = 9
* Solve above example by using Assumed method.
Assumed Mean Method
• In this method, we generally assume a value as the
mean (namely a). This value is taken for calculating the
deviations based on which the formula is defined. Also,
the data will be in the form of a frequency distribution
table with classes.
• Thus, the formula to find the mean in assumed mean
method is:
Here,
• a = assumed mean
• fi = frequency of ith class
• di = xi – a = deviation of ith class
• Σfi = N = Total number of observations
• xi = class mark(if given in interval i.e. Continuous series then find mid
point))= (upper class limit + lower class limit)/2
Assumed Mean Method Examples
• If xi and fi are numerically large, the assumed mean method is preferred. Below
are some examples of calculating the mean of grouped data by this method.
Example 1:
• The following table gives information about the marks obtained by 110 students
in an examination. Find the mean marks of the students using the assumed mean
method.
Assumed mean = a = 25
Mean of the data:
• = 25 + (-10/ 110)
• = 25 -( 1/11)
• = (275-1)/11
• = 274/11
• =24.9
Hence, the mean marks of the students are 24.9.
Example 2:
• The table below gives information about the percentage distribution
of female employees in a company of various branches and a number
of departments. Find the mean percentage of female employees by
the assumed mean method.
Assumed mean = a = 40
• Mean = a+ (Σfidi /Σfi)
• =40+ (360/35)
• = 40+(72/7)
• = 40 + 10.28
• =50.28 (approx)
Hence, the mean percentage of female employees is 50.28.
Practice Questions on Assumed Mean Method
Solve the following questions using the formula of assumed mean
method.
1. Find the mean of the following data by assumed mean method.
2. The given distribution shows the number of runs scored by some top
batsmen of the world in one-day international cricket matches. Find the
mean of the data.
3. Find the mean of the following data using the assumed mean
method formula.
Step-deviation Method
• When the data values are large, the step-deviation method is used to find the mean. The formula is given
by:
Here,
• a = assumed mean
• fi = frequency of ith class
• xi – a = deviation of ith class
• ui = (xi – a)/h
• Σfi = N = Total number of observations
• xi = class mark = (upper class limit + lower class limit)/2
Example: Consider the following example to understand this method.
Find the mean of the following using the step-deviation method.
Solution: To find the mean, we first have to find the class marks
and decide A (assumed mean). Let A = 35 Here h (class width) = 10
• Using mean formula:
• x̄ = A + h × ∑xiui / ∑fi
• = 35 + (16/50) ×10 = 35 + 3.2 = 38.2
Mean = 38.
What is Median?
• Generally median represents the mid-value of the given set of data
when arranged in a particular order.
Median
• The value of the middlemost observation, obtained after arranging
the data in ascending or descending order, is called the median of the
data.
• For example, consider the data: 4, 4, 6, 3, 2. Let's arrange this data in
ascending order: 2, 3, 4, 4, 6. There are 5 observations. Thus, median
= middle value i.e. 4.
Case 1: Ungrouped Data
Step 1: Arrange the data in ascending or descending order.
Step 2: Let the total number of observations be n.
To find the median, we need to consider if n is even or odd. If n is odd,
then use the formula:
Median = (n + 1)/2th observation
Example 1: Let's consider the data: 56, 67, 54, 34, 78, 43, 23. What is
the median?
Solution:
Arranging in ascending order, we get: 23, 34, 43, 54, 56, 67, 78.
Here, n (number of observations) = 7
So, (7 + 1)/2 = 4
∴ Median = 4th observation
Median = 54
If n is even, then use the formula:
Median = [(n/2)th obs.+ ((n/2) + 1)th obs.]/2
Example 2: Let's consider the data: 50, 67, 24, 34, 78, 43. What is the median?
Solution:
Arranging in ascending order, we get: 24, 34, 43, 50, 67, 78.
Here, n (no.of observations) = 6
6/2 = 3
Using the median formula,
Median = (3rd obs. + 4th obs.)/2
= (43 + 50)/2
Median = 46.5
• Case 2: Grouped Data
• When the data is continuous and in the form of a frequency
distribution, the median is found as shown below:
• Step 1: Find the median class.
• Let n = total number of observations i.e. ∑ fi
• Note: Median Class is the class where (n/2) lies.
• Step 2: Use the following formula to find the median.
• where,
• l = lower limit of median class
• c = cumulative frequency of the class preceding the median class
• f = frequency of the median class
• h = class size
Solution: We need to calculate the cumulative frequencies to find the
median.
Calculation table:
N = 50
N/2 = 50/2 = 25
Median Class = (20 - 30)
l = 20, f = 22, c = 14, h = 10
Using Median formula:
= 20 + (25 - 14)/22 × 10
= 20 + (11/22) × 10
= 20 + 5 = 25
∴ Median = 25
Mode
• The value which appears most often in the given data i.e. the observation with the highest
frequency is called a mode of data.
Case 1: Ungrouped Data
• For ungrouped data, we just need to identify the observation which occurs maximum times.
• Mode = Observation with maximum frequency
• For example in the data: 6, 8, 9, 3, 4, 6, 7, 6, 3, the value 6 appears the most number of
times.
• Thus, mode = 6. An easy way to remember mode is: Most Often Data Entered. Note: A data
may have no mode, 1 mode, or more than 1 mode. Depending upon the number of modes
the data has, it can be called unimodal, bimodal, trimodal, or multimodal.
• The example discussed above has only 1 mode, so it is unimodal.
Case 2: Grouped Data
When the data is continuous, the mode can be found using the
following steps:
Step 1: Find modal class i.e. the class with maximum frequency.
Step 2: Find mode using the following formula:
where,
• l = lower limit of modal class,
• fm = frequency of modal class,
• f1 = frequency of class preceding modal class,
• f2 = frequency of class succeeding modal class,
• h = class width
Solution:
The highest frequency = 12, so the modal class is 40-60.
l = lower limit of modal class = 40
fm = frequency of modal class = 12
f1 = frequency of class preceding modal class = 10
f2 = frequency of class succeeding modal class = 6
h = class width = 20
Relation Between Mean, Median and Mode
• The three measures of central values i.e. mean, median,
and mode are closely connected by the following
relations (called an empirical relationship).
• 2Mean + Mode = 3Median
• For instance, if we are asked to calculate the mean,
median, and mode of continuous grouped data, then we
can calculate mean and median using the formulas as
discussed in the previous sections and then find mode
using the empirical relation.
For example, we have data whose mode = 65 and
median = 61.6.
Then, we can find the mean using the above mean,
median, and mode relation.
2Mean + Mode = 3 Median
∴2Mean = 3 × 61.6 - 65
∴2Mean = 119.8
⇒ Mean = 119.8/2
⇒ Mean = 59.9
Skewness
Skewness
The skewness in statistics is a measure of asymmetry
or the deviation of a given random variable’s distribution
from a symmetric distribution (like normal Distribution).
In Normal Distribution, we know that:
Median = Mode = Mean
• The blue curve is a Normal Distribution.
The yellow histogram shows some data that
follows it closely, but not perfectly .It is often called Bell curve, because it looks like a
bell.
Skewness in statistics can be divided into two categories.
They are:
• Positive Skewness
• Negative Skewness
Positive Skewness
• The extreme data values are higher in a positive skew
distribution, which increases the mean value of the data set. To
put it another way, a positive skew distribution has the tail on
the right side.
• It means that, Mean > Median > Mode in positive skewness
Negative Skewness
• The extreme data values are smaller in negative skewness,
which lowers the dataset’s mean value. A negative skew
distribution is one with the tail on the left side.
• Hence, in negative Skewness, Mean < Median < Mode.
Skewness Formula in Statistics
• Skewness is a measure used in statistics that helps
reveal the asymmetry of a probability distribution. It can
either be positive or negative, irrespective of the signs.
To calculate the skewness, we have to first find the
mean and variance of the given data.
• The skewness formula is given by:
Variance and Standard Deviation
are the two important measurements in statistics.
Variance is a measure of how data points vary from the
mean, whereas standard deviation is the measure of the
distribution of statistical data. The basic difference
between both is standard deviation is represented in the
same units as the mean of data, while the variance is
represented in squared units.
Variance
According to layman’s words, the variance is a measure of how
far a set of data are dispersed out from their mean or average
value. It is denoted as ‘σ2’.
Properties of Variance
• It is always non-negative since each term in the variance sum
is squared and therefore the result is either positive or zero.
• Variance always has squared units. For example, the variance
of a set of weights estimated in kilograms will be given in kg
squared. Since the population variance is squared, we cannot
compare it directly with the mean or the data themselves.
Standard Deviation
• The spread of statistical data is measured by the standard deviation.
Distribution measures the deviation of data from its mean or average
position. The degree of dispersion is computed by the method of
estimating the deviation of data points. It is denoted by the symbol, ‘σ’.
Properties of Standard Deviation
• It describes the square root of the mean of the squares of all values in a
data set and is also called the root-mean-square deviation.
• The smallest value of the standard deviation is 0 since it cannot be
negative.
• When the data values of a group are similar, then the standard deviation
will be very low or close to zero. But when the data values vary with
each other, then the standard variation is high or far from zero.
Variance and Standard Deviation Formula
The formulas for the variance and the standard deviation for both population
and sample data set are given below:
Variance and Standard deviation Relationship
Variance is equal to the average squared deviations from
the mean, while standard deviation is the number’s
square root. Also, the standard deviation is a square root
of variance. Both measures exhibit variability in
distribution, but their units vary: Standard deviation is
expressed in the same units as the original values,
whereas the variance is expressed in squared units.
Example
Question: If a die is rolled, then find the variance and
standard deviation of the possibilities.