0% found this document useful (0 votes)

34 views49 pages

Lecture 06-Describing Data Visual Information

Lecture material for CSULA ME3040

Uploaded by

khrisgriffis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views49 pages

Lecture 06-Describing Data Visual Information

Lecture material for CSULA ME3040

Uploaded by

khrisgriffis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Describing Data

Visual Information

Khris Griffis, Ph.D.

Lecture 06
CSULA: ME3040
Today's Objectives
🎯 Describe the shape, central tendency, and variability of data distributions
using statistical measures.
🎯 Compare measures of spread, such as range, interquartile range, variance,
and standard deviation.
🎯 Recognize data symmetry, skewness, and understand their impact on
interpreting data.
🎯 Key Focus: Using statistical measures to describe and interpret data
distributions.
1 Descriptive Statistics

Summarizing Data
Describing A Dataset
When describing a dataset, we generally consider the following three questions:
What is the general shape of the data?
Where are the data values centered?
How do the data vary?
These are all aspects of what we call the
distribution of the data.
We use some simple arithmetic, which depends (a little) on what the data distribution represents,
to describe these aspects of the data.
Shape
General Shape Of A Distribution
A symmetric distribution is one in which the left and right hand sides of the distribution are
roughly equally balanced.
A skewed (asymmetric) distribution is one in which there is no such equal balance. Right-
skewness refers to a longer right tail, while left-skewness correspond to a longer left tail.
A uniform distribution is a specific symmetric distribution in which all outcomes are equally
likely.
Example: Distributions
Internet access in the world Time between Old Faithful geyser
eruptions
Central Tendency
The Mean
The arithmetic mean is generally calculated as: We can generalize for unequal weights.
𝑤(𝑥|𝑎) = ∑ 𝑖 𝑥 𝑖 == 𝑎 be an indicator of value 𝑎,
∑
𝑁 Let 𝑛

and ∑𝑏∈𝑆 𝑤(𝑏|𝑆𝑥 ) is the sum of all possible weights

1
𝑥¯ = 𝑥𝑖
𝑁 𝑖=1
in the set, 𝑆𝑥 .
𝑤(𝑥)
Calculation: Then 𝑝(𝑥) = ∑ ∈ 𝑥 𝑤(𝑏) is the weight of value X.
𝑏 𝑆

𝑥1 + 𝑥2 +. . . +𝑥𝑛 Thus, the weighted Mean of random variable 𝑋

𝑥¯ =
𝑛 is:
sum of all the observations
= number of observations
Read: "ex-bar", we can see that it represents the sum of a
𝑥¯𝑤 = ∑
𝑁
𝑖=1
𝑥𝑖 𝑝(𝑥𝑖 )

random variable when all values carry equal weights ( 𝑁1 ).

The Median
The median of an ordered set of data values is:
The average of the middle 2 values (for an even number of entries)
The middle entry (for an odd number of entries)

mean of ( 𝑛2 )𝑡ℎ and ( 𝑛2 + 1) 𝑡ℎ

observations

(
𝑛+1 ) 𝑡ℎ
2
observation
The Mode
The mode is the value that occurs most often in the dataset. If no value in the dataset is repeated, then there
is no mode.
What is the mode in the dataset below?
4 5 9 5 11 7 5 3 7 8 6 5 12

Note: In a bimodal distribution, the taller peak is

called the major mode and the shorter one is
referred as the minor mode.
Robustness
Robustness is related to the impact of outliers on a statistic. In general, we say that a statistic is
robust if it is relatively unaffected by extreme values.
The median and the mode are robust, while the mean is not.
Mean: 4.52 Mean: 6.84
2.5 3.2 3.5 3.9 4.0 4.4 5.3 5.9 6.1 6.4 30
Median: 4.2 Median: 4.4
Pick The Central Tendency Descriptor
Wisely
Number of children per household in China
(2012)
Mean: 1.55
Median: 1
 More representative of the "typical" 2012
family (One Child Policy)
Averages Are Not “ Truth ”
In 1943, artist Abram Belskie and
obstetrician-gynecologist Robert Latou Norman Norma
Dickinson created sculptures of the
“average” American man and woman.
Dickinson averaged measurements from
15,000 men and women between the ages
of 21 and 25 to make idealized sculptures of
beauty.
In 1945, Cleveland Health Museum sought
to find a woman who matched Norma's
measurements. Of 4,000 entries only a
handful were similar, and an award of $100
was given to a waitress, who just kind of
matched.
Read Todd Rose's “Flaw of Averages ” for more
on this.
Central Variability:
"Spread" or "Variation" of Data Points
Variance and Standard Deviation
A common measure of data variability is the variance, which measures the squared distance each
data point is from the mean, on average.
𝑠 =
2 ∑ 𝑖(𝑥 − 𝑥¯) 2
Variance 𝑠 2
for a sample
𝑛−1

𝑠 =
2 ( 𝑥 1 −𝑥¯) 2 +(𝑥2 −𝑥¯)2 +...+(𝑥 −𝑥¯)2
𝑛 𝜎 2
for a population
𝑛−1

𝑠 =
2 sum of observed squared distance from sample mean For standard deviation, we use:
number of observations - 1 √2 √2
𝑠 = 𝑠 and 𝜎 = 𝜎 .
The standard deviation puts the variance into the same units as the data, providing a measure of
how large the average standardized distance from the center is.
Note: The use of 𝑛 − 1 in calculating variance helps to ensure that our estimate of the population variance is
unbiased and accounts for the extra uncertainty introduced by estimating the population mean from the
sample itself.
Bias
Sampling bias: when a sampling method
systematically yields results that are either too
high or too low.

🚨 May be avoided by using good sampling

technique (randomized).

Sampling variation: natural variation in results

from one random sample to next.

 May be reduced by using a larger sample.

Standard Deviation
Let's consider 2 sets of data, both have a mean of 100
Set 1: all values are equal to the mean, so there is
Numbers Mean SD no variability at all
Set 2: one value equals the mean and other four
100, 100, 100, 100, 100 100 0
values are about 10 points away from the mean.
90, 90, 100, 110, 110 100 10 So the average distance away from the mean is
about 10
Example, procedure for Set 2:
1. Calculate the sample mean: 𝑥¯ = 100 4. Sum result of step 3 divide by (𝑛 − 1): 𝑠2 = 5400
−1 = 4004
= 100
√
2. Calculate the difference between each value and 5. Square root result in step 4: 𝑠 = 100 = 10
the mean: [-10, -10, 0, 10, 10]
3. Square each difference in step 2: [100, 100, 0, 100,
100]
Use SD With Caution
Like the mean, the standard deviation does not cope well with skewed
distributions.
Why Are We Squaring Things?
𝑠 =
2 ∑ ( 𝑥 𝑖 − 𝑥
¯ ) 2
𝑛−1
Variance is the sum of the squared distances from each data point to the model.
Squaring the distances ensures that the distances are always positive.
Squaring makes large deviations gigantic and small deviations minuscule and imposes greater weight
to larger deviations, which taking the square root doesn't fix.
The Normal Distribution
The normal (or Gaussian) distribution is a continuous
probability distribution characterized by a symmetric, bell-shaped
curve.
PDF:
1
𝑓 (𝑥; 𝜇, 𝜎) = √ 𝑒 − 21 𝑥−𝜇
( 𝜎 )
2

𝜎 2𝜋
𝜇 = Central Tendency (𝐸[𝑋]) 68% of the observations lie within 1
standard "distance" of the center
𝜎 = Spread (𝐸[𝑋 − 𝐸[𝑋]]) 95% lie within 1.96 standard "distance" of
the center
𝑥 = Specific value of the 99% lie within 2.58 standard "distance" of
continuous variable the center

Often denoted N(𝜇, 𝜎2 ), the normal distribution is special as it underscores the Central Limit Theorem's
revelation that sums of independent variables universally converge to this form, regardless of their initial
distributions.
Let's Get MAD
MAD(𝑥) = median(|𝑥𝑖 − median(𝑥)|)
The median absolute deviation (MAD) is a measure of the variability of a dataset. It is calculated
by taking the median of the absolute differences between each data point and the median of the
dataset.
 MAD is a robust measure of variability, meaning it is less affected by outliers than other
measures of variability 
Coefficient of Variation (CV)
Introduced by Karl Pearson to compare relative variability
of different datasets, in an attempt to mitigate confusion
in interpreting standard deviation.
Mathematically, it is defined as:

CV = standard deviation × 100%

mean
Uses:
Neuroscience: Comparing varaince-to-mean (Fano Factor) in spike
counts.
Engineering: Assessing the uniformity of processes or materials.
The CV is “crude”, subject to the same
problems as the mean, and should be
used with caution.
Note: CV is dimensionless, but it is often reported as a
percentage.
2 Extended Measures Of Spread

Ways To Describe Intervals And Ranges

The Range
The range gives you the most basic information about the spread of a dataset. It is calculated by the
(arithmetic) difference between the lowest and highest data value.
Percentiles
The 𝑘 percentile is a value in the dataset that has 𝑘% of the data values at or below it and (100 − 𝑘)% of
th

the data values at or above it.

Note:
Here our dataset contains 40 data points. So each
40 = 2.5% of the data.
data point correspond to 100
A "quantile" is simply the fractional position,
i.e., 100%
𝑘
.
Understanding Quantiles
Quantiles are values that mark where specific proportions of your data fall below.
The quantile function is generally defined as: 𝑄𝑖 (𝑝) = (1 − 𝛾)𝑥𝑗 + 𝛾𝑥𝑗+1 ,
where 𝑗 = ⌊𝑘⌋ for 𝑘 = 𝑛𝑝 + 𝑚 and 𝛾 = 𝑘 − 𝑗.
For quantile method 8 (unbiased median), the quantile function is defined as
𝑄 8 (𝑝) = (1 − 𝛾)𝑥 𝑗 + 𝛾𝑥 𝑗+1 ,
where 𝑗 = ⌊𝑘⌋, 𝑚 = 𝑝+1 3 , 𝑘 = 𝑛𝑝 + 𝑚, and 𝛾 = 𝑘 − 𝑗.
Reference: Hyndman and Fan (1996) for a detailed explanation on quantile types and their applications.
Given dataset: ✔︎ 𝑛 = 10
✔︎ 𝑝=
[−0.977, −0.151, −0.103,0.4,0.411,0.95,0.979,1.764,1.868,2.241] 0.5

+1
✔︎ 𝑚 = 3 = 0.50+1 = 1.5 = 0.5
𝑝

Calculating the median from 𝑄8 (0.5): 3 3

✔︎ 𝑘 = 𝑛𝑝 + 𝑚 = 10 × 0.50 + 0.5 = 5.5
𝑄8 ( ) = (1 − 𝛾)𝑥 + 𝛾𝑥 +1 = (1 − )0.411 + ( )0.95 = 0.6805
0.5 5 5 0.5 0.5
✔︎ 𝑗 = ⌊𝑘⌋ = ⌊5.5⌋ = 5
Calculating the median from taking mean of middle 2 elements: ✔︎ 𝛾 = 𝑘 − 𝑗 = 5.5 − 5 =0.5

𝑚 = 0.411+0.95
2 = 0.6805
Interquartile Range
The median divides the data into two equal halves (it is the 50𝑡ℎ percentile). If we divide each of those
halves again, we obtain two additional statistics known as the first (Q1) and third (Q3) quartiles, which are
the 25𝑡ℎ and 75𝑡ℎ percentiles.
Interquartile range: IQR = 𝑄3 − 𝑄1 A value is considered an outlier if it is:
Smaller than 𝑄1 − 1.5 × 𝐼𝑄𝑅
or
Larger than 𝑄3 + 1.5 × 𝐼𝑄𝑅
MATLAB Code
data = [-0.977, -0.151, -0.103, 0.4, 0.411, 0.95, 0.979, 1.764, 1.868, 2.241];
Q1 = quantile(data, 0.25, 'method', 8);
Q3 = quantile(data, 0.75, 'method', 8);
IQR = Q3 - Q1;
outliers = data(data < Q1 - 1.5 * IQR | data > Q3 + 1.5 * IQR);
disp(outliers);
Outliers
An outlier is an observed value that is notably distinct from the other
values in a dataset. Usually, an outlier is much larger or much smaller than
the rest of the data values.
Displaying the data: Boxplot
A boxplot is a graphical display of the five number summary for a quantitative variable. It shows the general
shape of the distribution, identifies the middle 50% of the data, and highlights any outliers.

A boxplot includes:

A box stretching from Q1 to Q3

A line that divides the box drawn at the
median
A line from each quartile to the most extreme
data value that is not an outlier. (if no outliers
minimum and maximum)
Each outlier plotted individually
Displaying the data: Boxplot v. Histogram
3 Visualizing Data

Graphical Exploration Of Quantitative

Information
Anscombe's Quartet
Dataset Dataset Dataset Dataset Dataset Dataset Dataset Dataset
#1 #2 #3 #4 #1 #2 #3 #4
x y x y x y x y Mean 9 7.5 9 7.5 9 7.5 9 7.5
10 8.04 10 9.14 10 7.46 8 6.58
Variance 11 4.1 11 4.1 11 4.1 11 4.1
8 6.95 8 8.14 8 6.77 8 5.76
Correlation 0.86 0.86 0.86 0.86
13 7.58 13 8.74 13 12.74 8 7.71
Regression y=3+0.5x y=3+0.5x y=3+0.5x y=3+0.5x
9 8.81 9 8.77 9 7.11 8 8.84
line
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Anscombe's Quartet
Release The Datasaurus!
100 X Mean: 54.26
Y Mean: 47.83
90

60
X SD: 16.76
Y SD: 26.93
y

Correlation: -0.06
40

Same Stats, Different Graphs:

0 Generating Datasets with Varied

Appearance and Identical Statistics
0 10 20 30 40 50 60 70 80 90 100

x
through Simulated Annealing
Justin Matejka
and George Fitzmaurice,
ACM SIGCHI Conference on Human
Factors in Computing System (2017)
Release The Datasaurus!
X Mean: 54.26
Y Mean: 47.83
X SD: 16.76
Y SD: 26.93
Correlation: -0.06
" Same Stats, Different Graphs:
Generating Datasets with Varied
Appearance and Identical
Statistics through Simulated
Annealing"
Justin Matejka
and George Fitzmaurice,
ACM SIGCHI Conference on Human
Factors in Computing System (2017)
Visualizing The Distribution
A dotplot is a common way to visualize the shape of a moderately sized dataset.
Species Longevity Species Longevity Species Longevity Species Longevity Species Longevity
Baboon 20 Chimpanzee 20 Fox 7 Leopard 12 Rabbit 5
Black bear 18 Chipmunk 6 Giraffe 10 Lion 15 Rhinoceros 15
Grizzly bear 25 Cow 15 Goat 8 Monkey 15 Sea lion 12
Polar bear 20 Deer 8 Gorilla 20 Moose 12 Sheep 12
Beaver 5 Dog 12 Guinea Pig 4 Mouse 3 Squirrel 10
Buffalo 15 Donkey 12 Hippopotamus 25 Opossum 1 Tiger 16
Camel 12 Elephant 40 Horse 20 Pig 10 Wolf 5
Cat 12 Elk 15 Kangaroo 7 Puma 12 Zebra 15

Note:
For this particular dataset,
values are integers and can be
easily stacked.
From Dotplot To Histogram
A dotplot, challenging to construct with overlapping dots for similar, numerous values, can be
replaced by a histogram. Histograms aggregate similar values through counts, effectively
displaying data distribution.
Process to construct a histogram
1 Define "boundaries" (they form bins)
45 50 55 60 65 70 75 80 85 90 95 100 105
2 Count the number of elements inside each
bin
Histogram Characteristics
 Histograms can be
Bin width: 5
Bin offset: 0

sensitive to parameter
choices!
In particular the 55
40 45 50 55 60 65 70 75 80 85 90 95 100 105 110

bin width 50

45
40

and bin offset

Count
30

can drastically change

the histogram overall

0
40 45 50 55 60 65 70 75 80 85 90 95 100 105 110

look.
Bargraphs are evil
1) Part of the range covered by the bar might have never been observed in the sample
Bar graphs are evil
2) They conceal the variance and the underlying distribution of the data

Look the same? They're not!

Bargraphs are evil
3) They are associated with (usually not defined) error bars

Different types, different meanings:

Cumming, G. et al. (2007). 'Error bars in experimental

biology'.
J Cell Biol 177 (1): 7-11"
Avoid Bargraphs!
To reveal the distribution of the data: About Figure 1:
Display data in their raw form First set: Gaussian (or normal)
A dot plot is a good start distribution (symmetrically distributed)
Dynamite plunger plots conceal data Second set: right skewed, log-normal
Check the pattern of distribution of the (few large values). This type of
values distribution of values is quite common.

Plunger plots only: who would know that the values were skewed [...] and that
the common statistical tests would be inappropriate?

"For better characterization of a sample, we prefer box, swarm, or violin plots for their ability to show the distribution of the data."
You've been warned before!
A Better Option: Dotplot
If the number of data is relatively small, showing directly the raw data and accompanying
mean/median is best.
A Better Option: Beeswarm
A Beeswarm is a dot plot that shows the distribution of data points in a way that avoids overlap.
10 random points = 50 = 5 Generate

45 46 47 48 49 50 51 52 53 54

Dotplot

Jitter

Beeswarm
A Better Option: Boxplot
A boxplot is a graphical display of the five number summary for a quantitative variable. It shows the general
shape of the distribution, identifies the middle 50% of the data, and highlights any outliers.
A Better Option: Boxplot
A boxplot is a graphical display of the five number summary for a quantitative variable. It shows the general
shape of the distribution, identifies the middle 50% of the data, and highlights any outliers.
Showing The Data Is Best

“ Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing ”

Justin Matejka and George Fitzmaurice,

ACM SIGCHI Conference on Human Factors in Computing System (2017)

Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
Statistics for Data Analysis
No ratings yet
Statistics for Data Analysis
59 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Business Statistics: Session 2
No ratings yet
Business Statistics: Session 2
60 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
2a. Describing Variables With Numbers
No ratings yet
2a. Describing Variables With Numbers
30 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Probability Theory & Statistics: Describing Data: Numerical
No ratings yet
Probability Theory & Statistics: Describing Data: Numerical
36 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
52 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
2 Descriptives
No ratings yet
2 Descriptives
43 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
Lecture 3 - Numerical Statistics
No ratings yet
Lecture 3 - Numerical Statistics
7 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Statistics For Managers Using Microsoft Excel: 5 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 5 Edition
54 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Slides Week2
No ratings yet
Slides Week2
43 pages
Dr. K. M. Salah Uddin Associate Professor Dept. of MIS, DU
No ratings yet
Dr. K. M. Salah Uddin Associate Professor Dept. of MIS, DU
41 pages
Statistical Data
No ratings yet
Statistical Data
41 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Introduction To Descriptive Statistics 2014
67% (3)
Introduction To Descriptive Statistics 2014
72 pages
Freq. Distribution Characteristics
No ratings yet
Freq. Distribution Characteristics
13 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Twenty Two
No ratings yet
Twenty Two
49 pages
Geopath Indexing Workbook
No ratings yet
Geopath Indexing Workbook
53 pages
Socio-Economic Factors Affecting Urban Private Investment (The Case of Mekelle City)
No ratings yet
Socio-Economic Factors Affecting Urban Private Investment (The Case of Mekelle City)
55 pages
Spherical Trigonometry - Nutshell Vol 8
No ratings yet
Spherical Trigonometry - Nutshell Vol 8
80 pages
Basic Statistics for Educators
No ratings yet
Basic Statistics for Educators
11 pages
Average Questions For
No ratings yet
Average Questions For
9 pages
Integration
No ratings yet
Integration
4 pages
Brahma Raksha S
No ratings yet
Brahma Raksha S
7 pages
Impact of A Jet
No ratings yet
Impact of A Jet
12 pages
Percentage Sem1
No ratings yet
Percentage Sem1
24 pages
Design of Logical Topologies For Wavelen
No ratings yet
Design of Logical Topologies For Wavelen
12 pages
2009 KS3 Maths Level 4-6 Paper 2 Calculator Allowed
No ratings yet
2009 KS3 Maths Level 4-6 Paper 2 Calculator Allowed
28 pages
CMP 202 (Recursion)
No ratings yet
CMP 202 (Recursion)
13 pages
Quantum Mechanics Script
No ratings yet
Quantum Mechanics Script
1,316 pages
Inference For Two Population Means: Case Study
No ratings yet
Inference For Two Population Means: Case Study
33 pages
TCS Latest Pattern Placement Questions - 28
No ratings yet
TCS Latest Pattern Placement Questions - 28
5 pages
Tqmi Problem Solving Tools and Techniques 27 July 160215113104
No ratings yet
Tqmi Problem Solving Tools and Techniques 27 July 160215113104
103 pages
Cartesian Coordinates
No ratings yet
Cartesian Coordinates
3 pages
Surveying May 2021
No ratings yet
Surveying May 2021
3 pages
The Father of Financial Engineer
No ratings yet
The Father of Financial Engineer
5 pages
Grade 7 Math Curriculum Guide
No ratings yet
Grade 7 Math Curriculum Guide
3 pages
Class 6 Maths Mye 2021
No ratings yet
Class 6 Maths Mye 2021
12 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
Apc 4.00 Unit 4a Packet 2021 v2
No ratings yet
Apc 4.00 Unit 4a Packet 2021 v2
22 pages
Enhanced GWO for Heart Disease Prediction
No ratings yet
Enhanced GWO for Heart Disease Prediction
13 pages
4a Motion in A Plane Vectors
No ratings yet
4a Motion in A Plane Vectors
28 pages
Arrays & Strings Pactice Problems Class
No ratings yet
Arrays & Strings Pactice Problems Class
5 pages
CH 6
No ratings yet
CH 6
2 pages
Effects of Global Oil Price On Exchange Rate, Trade Balance, and Reserves in Nigeria: A Frequency Domain Causality Approach
No ratings yet
Effects of Global Oil Price On Exchange Rate, Trade Balance, and Reserves in Nigeria: A Frequency Domain Causality Approach
14 pages
Basic Calculus Guide
No ratings yet
Basic Calculus Guide
324 pages

Lecture 06-Describing Data Visual Information

Uploaded by

Lecture 06-Describing Data Visual Information

Uploaded by

Describing Data

Khris Griffis, Ph.D.

and ∑𝑏∈𝑆 𝑤(𝑏|𝑆𝑥 ) is the sum of all possible weights

𝑥1 + 𝑥2 +. . . +𝑥𝑛 Thus, the weighted Mean of random variable 𝑋

random variable when all values carry equal weights ( 𝑁1 ).

mean of ( 𝑛2 )𝑡ℎ and ( 𝑛2 + 1) 𝑡ℎ

Note: In a bimodal distribution, the taller peak is

🚨 May be avoided by using good sampling

Sampling variation: natural variation in results

 May be reduced by using a larger sample.

CV = standard deviation × 100%

Ways To Describe Intervals And Ranges

the data values at or above it.

Calculating the median from 𝑄8 (0.5): 3 3

A box stretching from Q1 to Q3

Graphical Exploration Of Quantitative

Same Stats, Different Graphs:

0 Generating Datasets with Varied

and bin offset

can drastically change

the histogram overall

Look the same? They're not!

Different types, different meanings:

Cumming, G. et al. (2007). 'Error bars in experimental

Justin Matejka and George Fitzmaurice,

ACM SIGCHI Conference on Human Factors in Computing System (2017)

You might also like