0% found this document useful (0 votes)
0 views

Descriptive Statistics - Numerical measure

Descriptive stats with professor....

Uploaded by

pratham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Descriptive Statistics - Numerical measure

Descriptive stats with professor....

Uploaded by

pratham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Science

for Managerial
Decisions

Jasashwi Mandal
NITIE Mumbai
Descriptive Statistics: Numerical
Measures
▪ Measures of Location
▪ Measures of Variability
Measures of Location
▪ Mean
▪ Median
▪ Mode
▪ Weighted Mean
▪ Percentiles
▪ Quartiles
Mean
▪ Most important measure of location is the mean
▪ Provides a measure of central location
▪ The mean of a data set is the average of all the data values
▪ The sample mean 𝑥ҧ is the point estimator of the population
mean 𝜇
Sample Mean 𝑥ҧ
Sum of the values of the
n observations

Number of observations
in the sample
Population mean 𝜇
Sum of the values of the
n observations

σ 𝑥𝑖
𝜇=
𝑁
Number of observations
in the population
Sample Mean 𝑥ҧ
▪ Example: Apartment Rents
▪ Seventy apartments were randomly sampled in a small college town The
monthly rent prices for these apartments are listed below
Sample Mean 𝑥ҧ
▪ Example: Apartment Rents
▪ Seventy apartments were randomly sampled in a small college town The monthly rent prices
for these apartments are listed below
Median
▪ The median of a data set is the value in the middle when the data items are
arranged in ascending order.
▪ Whenever a data set has extreme values, the median is the preferred measure
of central location.
▪ The median is the measure of location most often reported for annual income
and property value data.
▪ A few extremely large incomes or property values can inflate the mean.
Median
▪ Here we have an odd number of observations:
7 observations:
26, 18, 27, 12, 14, 27, and 19.
Rewritten in ascending order:
12, 14, 18, 19, 26, 27, and 27.

▪ The median is the middle value in this list, so the median = 19.
Median
▪ Here we have an even number of observations:
8 observations:
26, 18, 27, 12, 14, 27, 19, and 30.
Rewritten in ascending order:
12, 14, 18, 19, 26, 27, 27, and 30.

▪ The median is the average of the two middle values in this list, so the
median = (19 + 26)/2 = 22.5.
Median
▪ Example: Apartment Rents
Notice that there are 70 values provided which are in ascending order.
▪ Averaging the 35th and 36th values: Median (575 + 575)/2 = 575.
Mode
▪ The mode of a data set is the value that occurs with the greatest frequency.
▪ The greatest frequency can occur at two or more different values.
▪ If the data have exactly two modes, the data are bimodal.
▪ If the data have more than two modes, the data are multimodal.

The mode is 550.


Weighted Mean

▪ In some instances the mean is computed by giving each observation a weight that
reflects its relative importance.
▪ The choice of weights depends on the application.
▪ The weights might be the number of credit hours earned for each grade, as in GPA.
▪ In other weighted mean computations, quantities such as pounds, dollars, or volume are
frequently used.
Weighted Mean
Weighted Mean
▪ Ron Butler, a home builder, is looking over the expenses he incurred for a house he just
built.
▪ For the purpose of pricing future projects, he would like to know the average wage ($/hour)
he paid the workers he employed.
▪ Listed below are the categories of workers he employed, along with their respective wage
and total hours worked.

Worker Wage ($/hr) Total Hours


Carpenter 21.60 520

Electrician 28.72 230

Laborer 11.80 410

Painter 19.75 270

Plumber 24.16 160


Weighted Mean
▪ Example: Construction Wages

Equally-weighted (simple) mean = $21.21


Percentiles
▪ A percentile provides information about how the data are spread over the
interval from the smallest value to the largest value.
▪ Admission test scores for colleges and universities are frequently reported in
terms of percentiles.
▪ The 𝑝th percentile of a data set is a value such that at least p percent of the
items take on this value or less and at least (100 – 𝑝) percent of the items take
on this value or more.
Percentiles
i. Arrange the data in ascending order (smallest value to largest value).
ii. Compute an index i
𝒑
𝒊 = 𝒏
𝟏𝟎𝟎
where p is the percentile of interest and n is the number of observations
iii. (a) If 𝑖 is not an integer, round up. The next integer greater than 𝑖 denotes the
position of the 𝑝𝑡ℎ percentile.
(b) If 𝑖 is an integer, the 𝑝𝑡ℎ percentile is the average of the values in positions
𝑖 and 𝑖 + 1.
80th Percentile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)

𝒊 = (𝒑/𝟏𝟎𝟎)𝒏 = (𝟖𝟎/𝟏𝟎𝟎)𝟕𝟎 = 𝟓𝟔
Averaging the 56th and 57th data values:
80th Percentile = (635 + 649)/2 = 642
80th Percentile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)

“At least 80% of the “At least 20% of the


items take on a items take on a
value of 646 or less.” value of 646 or more.”
Quartiles

▪ Quartiles are specific percentiles.


▪ First Quartile = 25th Percentile
▪ Second Quartile = 50th Percentile = Median
▪ Third Quartile = 75th Percentile
Third Quartile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)
▪ Third quartile = 75th percentile
▪ i = (p/100)n = (75/100)70 = 52.5 = 53
▪ Third quartile = 625
Measures of Variability

▪ Range
▪ Interquartile Range
▪ Variance
▪ Standard Deviation
▪ Coefficient of Variation
Range
▪ The range of a data set is the difference between the largest and smallest data value.
▪ It is the simplest measure of variability.
▪ It is very sensitive to the smallest and largest data values.

▪ Range = largest value – smallest value = 715 – 525 = 190.


Interquartile Range
▪ The interquartile range of a data set is the difference between the third
quartile and the first quartile.
▪ It is the range for the middle 50% of the data.
▪ It overcomes the sensitivity to extreme data values.
Interquartile Range
▪ 3rd Quartile (Q3) = 625
▪ 1st Quartile (Q1) = 545

▪ IQR = 625 – 545 = 80


Variance
▪ The variance is a measure of variability that utilizes all the data.
▪ It is based on the difference between the value of each observation (xi)
and the mean (𝑥ҧ for a sample, m for a population).
▪ The variance is useful in comparing the variability of two or more
variables.
Sum of deviations about the mean ?
Sum of squared deviations about the mean ?
Variance
▪ The variance is the average of the squared deviations between each
data value and the mean.
▪ The variance of a sample is:

▪ The variance for a population is:


Standard Deviation
▪ The standard deviation of a data set is the positive square root of the
variance.
▪ It is measured in the same units as the data, making it more easily interpreted
than the variance.
▪ The standard deviation of a sample is:

▪ The standard deviation of a population is:


Coefficient of Variation
▪ The coefficient of variation indicates how large the standard deviation is
relative to the mean.
▪ The coefficient of variation of a sample is:

▪ The coefficient of variation of a population is:


Sample Variance, Standard Deviation, and
Coefficient of Variation
Example: Apartment Rents

• The variance is:

• The standard deviation is:

• The coefficient of variation is:

You might also like