Presentation - Week 4 1
Presentation - Week 4 1
1
Statistical Characteristics
Measures of Measures of
location dispersion
(position)
1) Mean 1) Range
2) Median 2) Variance
3) Quartile 3) Standard Deviation
4) Interquartile Range (IQR)
2
Mean
The arithmetic mean or arithmetic average, or just the
mean or the average, is the sum of a collection of
numbers divided by the count of numbers in the
collection
A= mean
n= number of values
ai= data set values
https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-
data/mean-median-basics/a/mean-median-and-mode-review 3
Weighted Mean
4
Geometric Mean
5
Harmonic Mean
6
Trimmed mean
A trimmed mean is a method of averaging that
removes a small percentage of the largest and
smallest values before calculating the mean.
Let's say, as an example, a figure skating
competition produces the following scores: 6.0,
8.1, 8.3, 9.1, and 9.9.
To trim the mean by a total of 40%, we remove the
lowest 20% and the highest 20% of values,
eliminating the scores of 6.0 and 9.9.
7
Median
• Median: The middle number; found by:
• 1) Ordering all data points and picking out
the one in the middle (or if there are two
middle numbers, taking the mean of those
two numbers).
=> Position (odd)
=> Applied for numbers
(even)
https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-
data/mean-median-basics/a/mean-median-and-mode-review 8
Quartile
Quartiles divide your data into four parts, as equal as
possible. For the calculation quartiles, the data must be
sorted from the smallest to the largest value.
• Quartile (Q1): The middle value between the smallest
value (minimum) and the median.
• Quartile (Q2): The median of the data, i.e. 50% of the
values are smaller and 50% of the values are larger.
• Quartile (Q3): The middle value between the median
and the largest value (maximum).
9
Interquartile Range
Interquartile Range (IQR): Interquartile range is
defined as the range between 75 percentile
(Q3) and 25 percentile (Q1).
IQR = Q3 – Q1
10
Example
12
The range is the easiest measure of dispersion. It is
simply calculated by subtracting the highest value
from the lowest value.
13
Standard Deviation & Variance
Variance: Defined as the average of squared difference from the mean. measures
how far each data point in datasets from the mean
Standard deviation : indicates the spread of a variable around its mean value.14
Another way to calculate the variance
15
Example
Ex: we have N=5 element and sum of (xi)=25,
sum of (xi^2)=750 find the variance
16
Mode
The mode is the value that appears most frequently in a data
set. A set of data may have one mode, more than one mode, or no
mode at all.
17
Example of the Mode
• In the following list of numbers, 3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48
16 is the mode since it appears more times in the set than any other number:
• A set of numbers can have more than one mode (this is known as bimodal if there
are two modes) if there are multiple numbers that occur with equal frequency, and
more times than the others in the set. 3, 3, 3, 9, 16, 16, 16, 27, 37, 48
• In the above example, both the number 3 and the number 16 are modes as they
each occur three times and no other number occurs more often.
• If no number in a set of numbers occurs more than once, that set has no mode: 3,
6, 9, 16, 27, 37, 48
• A set of numbers with two modes is bimodal, a set of numbers with three modes
is trimodal, and any set of numbers with more than one mode is multimodal.
18
How Do I Calculate the Mode?
• Calculating the mode is fairly straightforward. Place all numbers
in a given set in order; this can be from lowest to highest or
highest to lowest, and then count how many times each number
appears in the set.
• The one that appears the most is the mode.
19
Dot Plot
A survey of "How long does it take you to eat breakfast?" has these results:
Minutes: 0 1 2 3 4 5 6 7 8 9 10 11 12
People: 6 2 3 5 2 5 0 0 2 3 7 4 1
Which means that 6 people take 0 minutes to eat breakfast (they probably had no
breakfast!), 2 people say they only spend 1 minute having breakfast, etc. And here is the
dot plot:
https://www.cuemath.com/data/dot-plot/ 20
Exercises
The following measurements were recorded for the drying time, in
hours, of a certain brand of latex paint
21
22
Exercises
According to the journal Chemical Engineering, an important property of a
fiber is its water absorbency. A random sample of 20 pieces of cotton
fiber was taken and the absorbency on each piece was measured. The
following are the absorbency values:
(a) Calculate the sample mean and median for the above sample values.
(b) Compute the 10% for each side trimmed mean.
(c) Do a dot plot of the absorbency data.
(d) Using only the values of the mean, median, and trimmed mean, do you
have evidence of outliers in the data?
23
24
Interpolation
• Interpolation is generally used in engineering and similar
sciences based on experiments/measurements to fit the
collected data to a function curve.
• In cases when collected data is scattered (dispersed) and
extremely heterogeneous (different), it becomes important
to find the values in the empty fields by interpolation.
• Extrapolation is also used to make predictions in an area
outside the known points.
25
26
27
28
Calculation of the median in the case
of continuous variable
Here are the amounts paid by customers during a sales period
in a store X :
xi ni
[0- 100 TL [ 43
[100- 200 TL [ 50
[200- 300 TL [ 56
[300 - 400 TL [ 34
[400- 500 TL [ 23
[500 TL and more 13
Total 219