0% found this document useful (0 votes)

27 views

Lec2 2-Dataset2

This document discusses various statistical measures for describing data, including variance, standard deviation, histograms, boxplots, and scatter plots. It also covers distance measures that can quantify similarity between data points, such as Minkowski distance. Distance measures are useful for clustering and other analytics tasks.

Uploaded by

Shanti Grover

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Lec2 2-Dataset2

Uploaded by

Shanti Grover

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Sets (2)

Measuring the Dispersion of Data

• Variance and standard deviation (sample: s, population: σ)
• Variance: (algebraic, scalable computation)
𝑁 𝑁
1 1
𝜎2 = ෍(𝑥𝑖 − 𝜇)2 = ෍ 𝑥𝑖 2 − 𝜇2
𝑁 𝑁
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
2
1 2
1 1
𝑠 = ෍(𝑥𝑖 − 𝑥)
ǉ = [෍ 𝑥𝑖 − (෍ 𝑥𝑖 )2 ]
2
𝑛−1 𝑛−1 𝑛
𝑖=1 𝑖=1 𝑖=1

N: population size; n: sample size

• Standard deviation s (or σ) is the square root of variance s2
(or σ2)
𝑁 𝑁
1 1
𝜎= ෍(𝑥𝑖 − 𝜇)2 = ෍ 𝑥𝑖 2 − 𝜇 2
𝑁 𝑁
𝑖=1 𝑖=1

𝑛 𝑛 𝑛
1 1 1
𝑠= ǉ 2=
෍(𝑥𝑖 − 𝑥) [෍ 𝑥𝑖 2 − (෍ 𝑥𝑖 )2 ]
𝑛−1 𝑛−1 𝑛
𝑖=1 𝑖=1 𝑖=1 3
Properties of Normal Distribution Curve

• The normal (distribution) curve

• From μ–σ to μ+σ: contains about 68% of the measurements
(μ: mean, σ: standard deviation)
• From μ–2σ to μ+2σ: contains about 95% of it
• From μ–3σ to μ+3σ: contains about 99.7% of it

4
Graphic Displays of Basic Statistical Descriptions

• Boxplot: graphic display of five-number summary

• Histogram: x-axis are values, y-axis represents frequencies or frequencies per

unit

• Scatter plot: each pair of values is a pair of coordinates and plotted as points in
the plane

5
Histogram Analysis
• Histogram: Graph display of tabulated
frequencies, shown as bars 40

• It shows what proportion of cases fall 35

into each of several categories 30
• Differs from a bar chart in that it is the 25
area of the bar that denotes the value, 20
not the height as in bar charts, a crucial
15
distinction when the categories are not
of uniform width 10

• The categories are usually specified as 5

non-overlapping intervals of some 0
10000 30000 50000 70000 90000
variable. The categories (bars) must be
adjacent

6
Histogram example: uneven width
Unit price ($) 40 43 47 … 74 75 78 … 115 117 120
Count of 275 300 250 … 360 515 540 … 320 270 350
items sold

300
𝑐𝑜𝑢𝑛𝑡 𝑜𝑓 𝑖𝑡𝑒𝑚 𝑠𝑜𝑙𝑑

200
𝑤𝑖𝑑𝑡ℎ

9000

4350
100 2900

40-59 60-99 100-120

unit price
7
Histogram example: even width
Unit price ($) 40 43 47 … 74 75 78 … 115 117 120
Count of 275 300 250 … 360 515 540 … 320 270 350
items sold

8
Histograms Often Tell More than Boxplots

◼ The two histograms

shown in the left may
have the same boxplot
representation
◼ The same values
Q1 Q2 Q3 for: min, Q1,
median, Q3, max
◼ But they have rather
different data
distributions

9 Q1 Q2 Q3
Scatter plot
• Provides a first look at bivariate data to see clusters of points,
outliers, etc
• Each pair of values is treated as a pair of coordinates and
plotted as points in the plane

10
Positively and Negatively Correlated Data

• The left half fragment is positively

correlated

• The right half is negative correlated

11
Uncorrelated Data

12
Similarity and Dissimilarity
• Similarity
• Numerical measure of how alike two data objects are
• Value is higher when objects are more alike
• Often falls in the range [0,1]
• Dissimilarity (e.g., distance)
• Numerical measure of how different two data objects are
• Lower when objects are more alike
• Minimum dissimilarity is often 0
• Upper limit varies
• Proximity refers to a similarity or dissimilarity

13
Data Matrix and Dissimilarity Matrix
• Data matrix
• n data points with p  x11 ... x1f ... x1p 
dimensions  
 ... ... ... ... ... 
• Two modes x ... xif ... xip 
 i1 
 ... ... ... ... ... 
x ... xnf ... xnp 
 n1 
• Dissimilarity matrix
• n data points, but  0 
registers only the  d(2,1) 0 
 
distance  d(3,1) d ( 3, 2 ) 0 
• A triangular matrix  
 : : : 
• Single mode d ( n,1) d ( n, 2 ) ... ... 0

14
Proximity Measure for Nominal Attributes

• Can take 2 or more states, e.g., red, yellow, blue, green

(generalization of a binary attribute)
• Simple matching
• m: # of matches, p: total # of variables
𝑝−𝑚
𝑑(𝑖, 𝑗) =
𝑝

• Note: for each attribute, we assume all its values are

equally important
• Otherwise, higher weights should be given to ‘more
important’ values (detail omitted)
15
Distance measure for nominal attributes
• Consider the following data set
Person id Gender language Hair-color
1 M English brown
2 F English black
3 M Spanish brown

3−1
 d(1,2) = = 0.67
3
3−2
 d(1,3) = = 0.33
3

16
Distance on Numeric Data: Minkowski Distance

• Minkowski distance: A general distance measure

where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-
dimensional data objects, and h is the order (the distance
so defined is also called L-h norm)
• Properties
• d(i, j) > 0 if i ≠ j, and d(i, i) = 0 (Positive definiteness)
• d(i, j) = d(j, i) (Symmetry)
• d(i, j)  d(i, k) + d(k, j) (Triangle Inequality)
• A distance that satisfies these properties is a metric

17
metric
• Claim:
• Minkowski distance is a metric for any h
• The distance defined for nominal attributes is a
metric
• The distance defined for ordinal attributes is a
metric (later)
• The distance defined for mix types attribute is a
metric (later)

18
Special Cases of Minkowski Distance
• h = 1: Manhattan (city block, L1 norm) distance
• E.g., the Hamming distance: the number of bits that are different between two binary vectors

d (i, j) =| x − x | + | x − x | +...+ | x − x |
i1 j1 i2 j 2 ip jp
• h = 2: (L2 norm) Euclidean distance

d (i, j) = (| x − x |2 + | x − x |2 +...+ | x − x |2 )
i1 j1 i2 j 2 ip jp

• h → . “supremum” (Lmax norm, L norm) distance.

• This is the maximum difference between any component (attribute) of the vectors

19
Special Cases of Minkowski Distance
• h = 1: Manhattan (city block, L1 norm) distance
• E.g., the Hamming distance: the number of bits that are different between two binary vectors

d (i, j) =| x − x | + | x − x | +...+ | x − x |
i1 j1 i2 j 2 ip jp
• h = 2: (L2 norm) Euclidean distance

d (i, j) = (| x − x |2 + | x − x |2 +...+ | x − x |2 )
i1 j1 i2 j 2 ip jp

• h → . “supremum” (Lmax norm, L norm) distance.

• This is the maximum difference between any component (attribute) of the vectors

20
Example: Minkowski Distance
Dissimilarity Matrices
point attribute 1 attribute 2 Manhattan (L1)
x1 1 2
L x1 x2 x3 x4
x2 3 5 x1 0
x3 2 0 x2 5 0
x4 4 5 x3 3 6 0
x4 6 1 7 0
Euclidean (L2)
L2 x1 x2 x3 x4
x1 0
x2 3.61 0
x3 2.24 5.1 0
x4 4.24 1 5.39 0

Supremum
L x1 x2 x3 x4
x1 0
x2 3 0
x3 2 5 0
x4 3 1 5 0
21
Ordinal Variables
• An ordinal variable has an order on its values
• Based on the order, we can rank to each value
• Then can treat it like interval-scaled
• For the ith object and fth attribute, replace xif by its rank
𝑟𝑖𝑓 ∈ {1, . . . , 𝑀𝑓 }

• map the rank of each variable onto [0, 1] by replacing rif by

𝑟𝑖𝑓 − 1
𝑧𝑖𝑓 =
𝑀𝑓 − 1
• compute the dissimilarity using methods for interval-scaled
variables

22
𝑟𝑖𝑓 ∈ {1, . . . , 𝑀𝑓 }
𝑟𝑖𝑓 − 1
Ordinal Variables: example 𝑧𝑖𝑓 =
𝑀𝑓 − 1

 Consider the data set to the right

day temperature rank mapping
 Rank four values:
1 very warm 4 1
 very cold: 1 2 cold 2 0.33
 cold: 2 3 warm 3 0.66
 warm: 3 4 very cold 1 0
 very warm: 4
 Map to [0, 1] interval:
 Very cold: (1-1)/(4-1) = 0
 Cold: (2-1)/(4-1) = 0.33
 Warm: (3-1)/(4-1) = 0.66
 Very warm: (4-1)/(4-1) = 1 0
0.67 0
 Dissimilarity matrix: 0.34 0.33 0
1 0.33 0.66 0

23
Attributes of Mixed Type
• A database may contain all attribute types
• Nominal, numeric, ordinal
• One may use a weighted formula to combine their effects
𝑝 𝑓 𝑓
Σ𝑓=1 𝛿𝑖𝑗 𝑑𝑖𝑗
𝑑(𝑖, 𝑗) = 𝑝 𝑓
Σ𝑓=1 𝛿𝑖𝑗

𝑓 𝑓
• 𝛿𝑖𝑗 is the indicator, and 𝑑𝑖𝑗 is the contribution, of attribute f
to the distance between object i and j
𝑓 𝑓
• 𝛿𝑖𝑗 = 0 if one of 𝑥𝑖𝑓 and 𝑥𝑗𝑓 is missing, 𝛿𝑖𝑗 = 1 otherwise
• f is nominal:
𝑓 𝑓
𝑑𝑖𝑗 = 0 𝑖𝑓 𝑥𝑖𝑓 = 𝑥𝑗𝑓 , 𝑑𝑖𝑗 = 1 otherwise
• f is numeric: use the normalized distance
• f is ordinal: Compute ranks rif and then zif , then treat zif as numeric
24
𝑝 𝑓 𝑓
Σ𝑓=1 𝛿𝑖𝑗 𝑑𝑖𝑗
Mixed attribute types: example 𝑑(𝑖, 𝑗) = 𝑝
Σ𝑓=1 𝛿𝑖𝑗
𝑓

nominal ordinal nominal numeric

Car id color age (year) model price ($)
1 black > 10 Honda 22,000
2 red 5 – 10 Honda 30,000
3 grey > 10 Buick 40,000
4 red <5 Ford 25,000
𝑐𝑜𝑙𝑜𝑟 𝑑 𝑐𝑜𝑙𝑜𝑟 +𝛿 𝑎𝑔𝑒 𝑎𝑔𝑒 𝑚𝑜𝑑𝑒𝑙 𝑑 𝑚𝑜𝑑𝑒𝑙 +𝛿 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒
𝛿12 12 12 𝑑12 +𝛿12 12 12 𝑑12
 𝑑 1,2 = 𝑐𝑜𝑙𝑜𝑟 +𝛿 𝑎𝑔𝑒 𝑚𝑜𝑑𝑒𝑙 +𝛿 𝑝𝑟𝑖𝑐𝑒
𝛿12 12 +𝛿12 12
𝑐𝑜𝑙𝑜𝑟 𝑎𝑔𝑒
𝑚𝑜𝑑𝑒𝑙 𝑝𝑟𝑖𝑐𝑒
 𝛿12 = 𝛿12 = 𝛿12 = 𝛿12 =1
𝑐𝑜𝑙𝑜𝑟 𝑚𝑜𝑑𝑒𝑙
 𝑑12 = 1, 𝑑12 =0
 For age:
 rank the values for age: ‘< 5’ → 1, ‘5 – 10’ → 2, ‘> 10’ → 3
 Normalize to [0,1]: 1 → 0, 2 → 0.5, 3 → 1
𝑎𝑔𝑒
 𝑑12 = 1 − 0.5 = 0.5
𝑝𝑟𝑖𝑐𝑒 30000−22000
 𝑑12 = = 0.44
40000−22000
1×1+1×0.5+1×0+1×0.44
 𝑑 1,2 = = 0.485 25
4
Cosine Similarity
• Objects viewed as vectors
• Similarity measures emphasize on directions

more similar than O1

O2
O2

more similar than

26
Cosine Similarity
 Directions for vectors can be measure by their angle

O1
O1 ∙ O2
sim 𝑂1 , 𝑂2 = cos 𝛼 =
𝛼 O2 O1 O2

where • indicates vector dot product, ||O|| is the length of vector O

 Let 𝑂1 = 𝑥1 , ⋯ , 𝑥𝑝 , 𝑂2 = 𝑦1 , ⋯ , 𝑦𝑝 , then
𝑝
 𝑂1 ∙ 𝑂2 = σ𝑖=1 𝑥𝑖 𝑦𝑖

 𝑂1 = σ𝑝𝑖=1 𝑥𝑖2

 𝑂2 = σ𝑝𝑖=1 𝑦𝑖2

27
Cosine Similarity: example
• A document can be represented by thousands of
attributes, each recording the frequency of a particular
word (such as keywords) or phrase in the document.

 Other vector objects: gene features in micro-arrays, …

 Applications: information retrieval, biologic taxonomy,
gene feature mapping, ...

28
Cosine Similarity: example

• Find the similarity between documents d1 and d2 where

d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
• Sim(d1, d2) = cos(d1, d2) = (d1 • d2) /||d1|| ||d2||
d1•d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 = 6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5 = 4.12
sim(d1, d2 ) = 0.94

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6414)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (640)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (991)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1852)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4101)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (627)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1015)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5143)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (460)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4355)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2001)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1087)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2787)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2032)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (814)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Swift Messages
0% (1)
Swift Messages
97 pages
Molten Metal Level by Heat TRF Analysis
No ratings yet
Molten Metal Level by Heat TRF Analysis
8 pages
MATH49111/MATH69111 Examples Sheet 5: Example: The Trapezium Rule For Integration Again..
No ratings yet
MATH49111/MATH69111 Examples Sheet 5: Example: The Trapezium Rule For Integration Again..
2 pages
Revised AR19 - M. Tech., (TE) Autonomous
No ratings yet
Revised AR19 - M. Tech., (TE) Autonomous
59 pages
Higher Set B Paper 2
No ratings yet
Higher Set B Paper 2
16 pages
CSO Gaddis Java Chapter05 6e
No ratings yet
CSO Gaddis Java Chapter05 6e
28 pages
FM1 Examrep15
No ratings yet
FM1 Examrep15
9 pages
A Fast Method For Nonlinear Schrodinger Equation
No ratings yet
A Fast Method For Nonlinear Schrodinger Equation
3 pages
Funky Friday Auto Player Gui - November 2021
No ratings yet
Funky Friday Auto Player Gui - November 2021
4 pages
Trajectory and Window Width Prediction For A Cased Hole Sidetrack Using A Whipstock
No ratings yet
Trajectory and Window Width Prediction For A Cased Hole Sidetrack Using A Whipstock
13 pages
Hasofer and Lind Method
No ratings yet
Hasofer and Lind Method
15 pages
Math 2232 Course Outline
100% (1)
Math 2232 Course Outline
3 pages
Array and Strings
No ratings yet
Array and Strings
55 pages
Lifts, Elevators, Escalators and Moving Walkways-Travelators
94% (17)
Lifts, Elevators, Escalators and Moving Walkways-Travelators
375 pages
MBA OR Probs on LPP formulation
No ratings yet
MBA OR Probs on LPP formulation
1 page
Chapter 10 - Circles
No ratings yet
Chapter 10 - Circles
20 pages
Sem 1
No ratings yet
Sem 1
2 pages
Soil Dynamics and Earthquake Engineering: Niki D. Beskou, Stephanos V. Tsinopoulos, Dimitrios D. Theodorakopoulos
No ratings yet
Soil Dynamics and Earthquake Engineering: Niki D. Beskou, Stephanos V. Tsinopoulos, Dimitrios D. Theodorakopoulos
10 pages
Design and Analysis of High Mast Solar Light Pole For Two Cross Sections
No ratings yet
Design and Analysis of High Mast Solar Light Pole For Two Cross Sections
12 pages
Class 10 Spreadsheet Question - Answers
No ratings yet
Class 10 Spreadsheet Question - Answers
3 pages
Greg 4171: Statistical Applications in The Garment Industry
No ratings yet
Greg 4171: Statistical Applications in The Garment Industry
4 pages
EEE-II-SEM-min
No ratings yet
EEE-II-SEM-min
103 pages
Java Programming Lab Manual
100% (1)
Java Programming Lab Manual
66 pages
Simple & Compound Interest, Growth & Decay
No ratings yet
Simple & Compound Interest, Growth & Decay
14 pages
Z Transform
No ratings yet
Z Transform
37 pages
Ben2203 Mid Semester Examination 2019 Synchronised
No ratings yet
Ben2203 Mid Semester Examination 2019 Synchronised
7 pages
C Dac Cet Asignments Solved by Lalit Naphade
No ratings yet
C Dac Cet Asignments Solved by Lalit Naphade
54 pages
PhotoMAth Test
No ratings yet
PhotoMAth Test
6 pages
Swelab Alfa Performance
No ratings yet
Swelab Alfa Performance
1 page
Chapter 5: Elements of Intersection Design and Layout Signal Timing and Design of Intersections
No ratings yet
Chapter 5: Elements of Intersection Design and Layout Signal Timing and Design of Intersections
77 pages

Lec2 2-Dataset2

Uploaded by

Lec2 2-Dataset2

Uploaded by

Data Sets (2)

Measuring the Dispersion of Data

N: population size; n: sample size

• The normal (distribution) curve

• Boxplot: graphic display of five-number summary

• Histogram: x-axis are values, y-axis represents frequencies or frequencies per

• It shows what proportion of cases fall 35

• The categories are usually specified as 5

40-59 60-99 100-120

◼ The two histograms

• The left half fragment is positively

• The right half is negative correlated

• Can take 2 or more states, e.g., red, yellow, blue, green

• Note: for each attribute, we assume all its values are

• Minkowski distance: A general distance measure

• h → . “supremum” (Lmax norm, L norm) distance.

• h → . “supremum” (Lmax norm, L norm) distance.

• map the rank of each variable onto [0, 1] by replacing rif by

 Consider the data set to the right

nominal ordinal nominal numeric

more similar than O1

more similar than

where • indicates vector dot product, ||O|| is the length of vector O

 Other vector objects: gene features in micro-arrays, …

• Find the similarity between documents d1 and d2 where

You might also like