Data Analysis
Descriptive Statistics
Lesson Plan
What are descriptive statistics and how are they used
Understanding the different measures of central tendency
Understanding variability and how to use them
Discussing and using Coefficient of Variation
Measures of Position
Understanding how to read and create Box and Whisker Plot
Flowchart of Descriptive Statistics
Descriptive Statistics
Depending on formula we use will determine if it about a population of a sample
Summarizes and provides features of a dataset of
either an entire population or of a sample:
▪ Characteristics of that dataset
Descriptive statistics
Measures of Central Variability
tendency How spred out is your data from the average
Typical value with in you data set
Measures of Central Tendency
Indication of a typical value in a dataset
Mode Median Mean Heavly effect by
extream values
Most frequent Middle value Average of
number in the in dataset dataset
dataset
Measures of Central Tendency
Advantages
Mode: - Not effect by extremes values
(outliers)
- Can be used on any level of
Bimodal
measurment Disadvantages
Two different modes, the graph has tow equal bumps - It may not represent the centre of
the data set
Multimodal Three or more modes
No mode
What are the
advantages and
disadvantages?
Measures of Central Tendency
Steps to finding the median Advantages
- Resitant to extream values
- Better measure of central
tendcy for skewed data
1. Sort data from lowest to Disadvantages
- Cannot be used with all
highest value level of meseaurments
(Catogorical)
2. Find the middle number (in an
odd dataset) that’s your
median!
3. If the number is even take the
average of the two middle
numbers that’s your
median!
Measures of Central Tendency
Advantages
Steps in Calculating the Mean - It looks at sum of of all x values (more repersentive)
- Use for continous and discrete data
Disadvantages
- Effected my the extremes (outliers)
1. Compute ∑X
2. Divide ∑X by the total number of data values
Symbol for Symbol
population for
Symbol for Sample Mean Symbol
mean
Population Mean
for populati
sample mean on size
x sample x
x= size
=
n N
Measures of Central Tendency
What is the mean of:
1 2 3 4 100
QUESTION: Does the mean represent the dataset?
22
What would 100 be in this dataset?
An outlier
What would be a better central tendency
Median, becausee they are not effect by outliers and their is no mode
The influence of outliers on the mean
Outliers are usually the result of
data entry/experimental error,
sampling problems or natural
variation.
Big impact on the calculation of
the mean.
A statistic is resistant to extreme
values if it is not affected much
by extreme observations
(outliers). Mode & Meadian
Resistant Measures of Central
Tendency
A measure NOT affected by extreme values in the data
set.
Examples of resistant measures
Median
Trimmed Mean**
Resistant Measures: Trimmed
Mean
Truncated mean
Trimmed Mean steps:
1. Put data in order from smallest to biggest
2. Remove k% off the top and bottom
3. Calculate the mean with the remaining dataset
Practice!
Calculate a 5% trimmed mean for the following sample
14, 20, 20, 21, 22, 23, 23, 24, 25, , 27, 30, , 30, 31, 32, 33,, 35, 35, 40, 43, 44, 70
Add all the number to get = 558, then divide by
5% x n = 1.05 rounds to 1 take 1 value on each side the new n which 19 to get a trimmed mean of
29.37
14 24 20 20 25 23 30
43 40 70 30 22 35 23
21 32 35 33 31 27 44
Weighted Mean
To assign more importance to certain numbers
w is the weight
(w • x)
x =
w
Where x is a data value
W is the weight assigned to the value
Practice!
Suppose you had a midterm score of 83% and your
final exam score is 95%. Your midterm was worth 40%
and your final exam is worth 60%. Calculate your final
grade.
(83%x40%) + (95%x60%)
----------------------------------- = 9020/100 =90.2%
60% + 40%
Central tendency and types of
measures
Mode – can be used with all levels of data: nominal,
ordinal, interval and ratio levels.
Median – may be used with ordinal, interval, or ratio
levels.
Mean – may be used with interval or ratio levels.
Mean, median, mode and
skewness
Mode < then mean & meadian
Mode > then Mean & Meadian
Variance in Data
Practice!
Set 1 Set 2
-10, 0, 10, 20, 30
8, 9, 10, 11, 12
Mean of 10 Mean of 10
x bar = 10 x bar = 10
For each of these datasets calculate the mean.
Does the mean tell us everything we need to
know about our dataset?
Measures of Variation
Allows you to see how close your dataset is to the measures of
Central Tendency
Range Variance Standard The one we
uses the most
• Measure of • Dispersion of Deviation
dispersion data around the
It allows us to varriation based on the
position of our data • Dispersion of
Effected by outliers mean
data from the
Range = • Standard
(maximum value) – mean
deviation2
(minimum value)
The square root of the Variance
Range: Example
Average weight of a carton of blueberries in ounces
Range of 10
Mode 22
Range of 10
Mode 27
Second one has more mariance
Standard Deviation and
Variance
Sample Variance Sample Standard Deviation
𝑛 2
2 𝑖=1(𝑥𝑖 −𝑥) 𝑆= 𝑆 2
𝑆 =
𝑛−1
n-1 is only for sample deviation
-Helpls accpunt for biasis
Standered deviation will
always be postive that is why
we sqaure
Standard Deviation and
Variance
Population Variance Population Standard Deviation
N
(x − ) 2
=
Population mean
2
i
=
2 i =1
N Population size
Computation Formula (sample
statistics)
Sample Variance Sample Standard Deviation
Do NOT For get order of operation
BEDMAS
Calculating Variance and
Standard Deviation
1. Find the mean
2. Calculate the Deviation Scores
3. Square Deviation Scores
4. Add all Deviation Scores
5. Divide by n-1 (sample) or by N (population). This is
the Variance!
6. Take the square root of the variance to find the
Standard Deviation.
Practice!
Compute the S2 and the S
2 3 3 8 10 10
Application of standard
deviation
• Standard deviation are positive numbers
Yes, because the means are effected by the outliers and because we include every singal value in you ste
• Units of standard deviation are the same as original
data value
Practice!
Compute the standard deviation
2 3 3 8 10 20
Comparing Standard Deviation
Coefficient of variation (CV)
• Used to compare two datasets with different scales
For Samples For Population
s
CV = 100 CV = 100
x
Practice!
Below are the mean height and weights for a sample
Marianopolis first year students registered in gym.
Compare the variation.
Heights Weights
Mean 68.34 in. 172.55 lb.
15.26%
4.42%
Standard deviation 3.02 in. 26.33 lb.
Measures of Position
Remember!
Why is understanding variability in data
important?
It helps us see how close are data is. How dispuresed it is
What is a shortcoming associated with mean and
standard deviation
We don’t know the shape of our stander deviation
Mean and standard deviation are
heavily affected by outliers!
Practice!
Range = 105-30= 75
Median: (42+55)/2= 48.5 30,35, 35,40,42,55,58,65,75,105
- This the midle point of your dat set
- Right side is more spread out Q1 is 35
Q3 s 65
Below is a sample of 10 salaries (in thousands
of dollars) of recent graduates from
Concordia’s John Molson School of Business
42 30 35 105 65
40 35 58 55 75
Percentiles
Distribution of a value such that P% of that data falls at or below it
and (100-P)% of the data fall at or above it.
Quartiles:
Types of percentile
Divides data into fourths
Computing Quartiles
1. Oder the data from smallest to largest
2. Find the median (this is Q2)
3. Find the the first quartile Q1 by finding the median of the
data falling below the Q2 position (and not including Q2).
4. Find the third quartile Q3 by finding the median of the
data falling above the Q2 position (and not including Q2)
Interquartile Range
The Interquartile range (IQR) is obtained by
subtracting the first quartile from the third.
Five Number Summary
Five number summary includes:
the minimum value
the first quartile Q1
the median (or second quartile Q2)
the third quartile, Q3
the maximum value
Box and Whisker (Boxplot)
Diagram
Steps
• Draw a scale to include lowest and highest value
A visual representation of 5 number summery
• Draw the box (Q1 and Q3)
• Draw a solid line for the median
• Draw the whiskers
How to find ouliers (fences)
What are fences Q3 + (1.5 x IQR)
Q1 + (1.5 x IQR)
Interpreting the Boxplot
Practice!
Below is a sample of 10 salaries (in thousands
of dollars) of recent graduates from
Concordia’s John Molson School of Business
42 30 35 105 65
40 35 58 55 75
1. Determine the IQR
2. Calculate the five number summary & draw a Box &
Whisker diagram