0% found this document useful (0 votes)

6 views

Week 02.1 Chaptr002

Uploaded by

liyabi7540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Week 02.1 Chaptr002

Uploaded by

liyabi7540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Mining:

Concepts and Techniques

— Chapter 2 —

Arslan Anjum
[email protected]

1
Quartile Deviation
 A measure similar to the special range (Q) is the inter-
quartile range . It is the difference between the third quartile
(Q3) and the first quartile (Q1). Thus

Q1=(n+1)/4 Q2=2[(n+1)/4] Q3=3[(n+1)/4]

Q  Q3  Q1
 Where ‘n’ is the number of observations.

 The inter-quartile range is frequently reduced to the measure

of semi-interquartile range, known as the quartile deviation
(QD), by dividing it by 2. Thus
Quartile Deviation
 Example: The wheat production (in Kg) of 20 acres is given as:
1120, 1240, 1320, 1040, 1080, 1200, 1440, 1360, 1680,
1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470,
1750, and 1885.
 Find the quartile deviation.

 Solution:
 After arranging the observations in ascending order, we
get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440,
1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880,
1885, 1960.
Quartile Deviation
Boxplot Analysis
 Five-number summary of a distribution
 Minimum, Q1, Median, Q3, Maximum
 Boxplot
 Data is represented with a box
 The ends of the box are at the first and third
quartiles, i.e., the height of the box is IQR
 The median is marked by a line within the
box
 Whiskers: two lines outside the box extended
to Minimum and Maximum
 Outliers: points beyond a specified outlier
threshold, plotted individually

5
Boxplot in Matlab

 >> d = [30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70,
110];
 >> boxplot(d);

6
Visualization of Data Dispersion: 3-D Boxplots

May 2, 2024 Data Mining: Concepts and Techniques 7

Graphic Displays of Basic Statistical Descriptions

 Boxplot: graphic display of five-number summary

 Histogram: x-axis are values, y-axis repres. frequencies
 Quantile plot: each value xi is paired with fi indicating
that approximately 100 fi % of data are  xi
 Quantile-quantile (q-q) plot: graphs the quantiles of
one univariant distribution against the corresponding
quantiles of another
 Scatter plot: each pair of values is a pair of coordinates
and plotted as points in the plane
8
Histogram Analysis
 Histogram: Graph display of tabulated frequencies, shown as bars
 It shows what proportion of cases fall into each of several
categories
 The categories are usually specified as non-overlapping intervals of
some variable. The categories (bars) must be adjacent

9
Quantile Plot
 Displays all of the data (allowing the user to assess both
the overall behavior and unusual occurrences)
 Plots quantile information
 For a data x data sorted in increasing order, f
i i
indicates that approximately 100 fi% of the data are
below or equal to the value xi

Data Mining: Concepts and Techniques 10

Quantile-Quantile (Q-Q) Plot
 Graphs the quantiles of one univariate distribution against the
corresponding quantiles of another
 View: Is there a shift in going from one distribution to another?
 Example shows unit price of items sold at Branch 1 vs. Branch 2 for
each quantile. Unit prices of items sold at Branch 1 tend to be lower
than those at Branch 2.

11
Scatter plot
 Provides a first look at bivariate data to see clusters of
points, outliers, etc
 Each pair of values is treated as a pair of coordinates and
plotted as points in the plane

12
Positively and Negatively Correlated Data

 The left half fragment is positively correlated

 The right half is negative correlated

13
Uncorrelated Data

14
Chapter 2: Getting to Know Your Data

 Data Objects and Attribute Types

 Basic Statistical Descriptions of Data

 Data Visualization

 Measuring Data Similarity and Dissimilarity

 Summary

15
Data Visualization
 Why data visualization?
 Gain insight into an information space by mapping data onto graphical
primitives
 Provide qualitative overview of large data sets
 Search for patterns, trends, structure, irregularities, relationships among
data
 Help find interesting regions and suitable parameters for further
quantitative analysis
 Provide a visual proof of computer representations derived
 Categorization of visualization methods:
 Pixel-oriented visualization techniques
 Geometric projection visualization techniques
 Icon-based visualization techniques
 Hierarchical visualization techniques
 Visualizing complex data and relations
16
Pixel-Oriented Visualization Techniques
 For a data set of m dimensions,
 The m dimension values of a record are mapped to m pixels at the
corresponding positions in the windows
 The colors of the pixels reflect the corresponding values

(a) Income (b) Credit Limit (c) transaction volume (d) age
17
Geometric Projection Visualization Techniques

 Visualization of geometric transformations and projections

of the data
 Methods
 Direct visualization
 Scatterplot and scatterplot matrices
 Landscapes
 Projection pursuit technique: Help users find meaningful
projections of multidimensional data
 Prosection views
 Parallel coordinates

18
Landscapes
Used by permission of B. Wright, Visible Decisions Inc.

news articles
visualized as
a landscape

 Visualization of the data as perspective landscape

 The data needs to be transformed into a (possibly artificial) 2D
spatial representation which preserves the characteristics of the data
19
Parallel Coordinates
 n equidistant axes which are parallel to one of the screen axes and
correspond to the attributes
 The axes are scaled to the [minimum, maximum]: range of the
corresponding attribute
 Every data item corresponds to a polygonal line which intersects each
of the axes at the point which corresponds to the value for the
attribute

• • •

Attr. 1 Attr. 2 Attr. 3 Attr. k

20
http://support.sas.com/documentation/
Icon-Based Visualization Techniques

 Visualization of the data values as features of icons

 Typical visualization methods
 Chernoff Faces
 Stick Figures
 General techniques
 Shape coding: Use shape to represent certain
information encoding
 Color icons: Use color icons to encode more
information
 Tile bars: Use small icons to represent the relevant
feature vectors in document retrieval
22
Chernoff Faces
 A way to display variables on a two-dimensional surface, e.g., let x be
eyebrow slant, y be eye size, z be nose length, etc.
 The figure shows faces produced using 10 characteristics--head
eccentricity, eye size, eye spacing, eye eccentricity, pupil size,
eyebrow slant, nose size, mouth shape, mouth size, and mouth
opening): Each assigned one of 10 possible values, generated using
Mathematica (S. Dickson)
 REFERENCE: Gonick, L. and Smith, W.
The Cartoon Guide to Statistics. New York:
Harper Perennial, p. 212, 1993
 Weisstein, Eric W. "Chernoff Face." From
MathWorld--A Wolfram Web Resource.
mathworld.wolfram.com/ChernoffFace.html
23
Stick Figure
A census data
figure showing
age, income,
used by permission of G. Grinstein, University of Massachusettes at Lowell

gender,
education, etc.

A 5-piece stick
figure (1 body
and 4 limbs w.
different
angle/length)

two attributes mapped to axes, remaining attributes mapped to angle or length of limbs
24
Hierarchical Visualization Techniques

 Visualization of the data using a hierarchical

partitioning into subspaces
 Methods
 Worlds-within-Worlds
 Dimensional Stacking
 Tree-Map
 Cone Trees

25
Worlds-within-Worlds
 Fix all other parameters at constant values - draw other (1 or 2 or 3
dimensional worlds choosing these as the axes)
 Software that uses this paradigm

 N–vision: Dynamic
interaction through data,
including rotation, scaling
(inner) and translation
(inner/outer)
 Auto Visual

26
Dimensional Stacking

attribute 4
attribute 2

attribute 3

attribute 1

 Partitioning of the n-dimensional attribute space in 2-D

subspaces, which are ‘stacked’ into each other
 Partitioning of the attribute value ranges into classes. The
important attributes should be used on the outer levels.
 But, difficult to display more than nine dimensions
 Important to map dimensions appropriately

27
Tree-Map
 Screen-filling method which uses a hierarchical partitioning of
the screen into regions depending on the attribute values
 The x- and y-dimension of the screen are partitioned alternately
according to the attribute values (classes)

https://support.office.com/
28
InfoCube
 A 3-D visualization technique where hierarchical
information is displayed as nested semi-transparent
cubes
 The outermost cubes correspond to the top level
data, while the subnodes or the lower level data
are represented as smaller cubes inside the
outermost cubes, and so on

02 Data
No ratings yet
02 Data
42 pages
L5 Data Visualization
No ratings yet
L5 Data Visualization
33 pages
DWDM-LS2-Fall-24-25
No ratings yet
DWDM-LS2-Fall-24-25
42 pages
02 Data
No ratings yet
02 Data
62 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
Lec.02 Getting to Know Your Data
No ratings yet
Lec.02 Getting to Know Your Data
62 pages
data mining 2
No ratings yet
data mining 2
64 pages
02 Data
No ratings yet
02 Data
65 pages
VIPDMTheoryChapter2
No ratings yet
VIPDMTheoryChapter2
56 pages
02 Data
No ratings yet
02 Data
64 pages
Data Type, Data Chart, Descriptive Statistics
No ratings yet
Data Type, Data Chart, Descriptive Statistics
65 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
46 pages
Module 1
No ratings yet
Module 1
64 pages
02Data
No ratings yet
02Data
66 pages
Data Analysts-1
No ratings yet
Data Analysts-1
65 pages
02Data
No ratings yet
02Data
65 pages
Lect 3
No ratings yet
Lect 3
51 pages
Lectur 4 Basic Statistical Descriptions of Data
No ratings yet
Lectur 4 Basic Statistical Descriptions of Data
44 pages
02Data
No ratings yet
02Data
65 pages
02 Data
No ratings yet
02 Data
64 pages
Chapter 3 Non Spatial Data Visualization
No ratings yet
Chapter 3 Non Spatial Data Visualization
45 pages
1_L2_Intro_DAM
No ratings yet
1_L2_Intro_DAM
27 pages
02data (Compatibility Mode)
No ratings yet
02data (Compatibility Mode)
11 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
02 Data
No ratings yet
02 Data
41 pages
03 Temporal, Geospatial Multivariate Data
No ratings yet
03 Temporal, Geospatial Multivariate Data
69 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
4 - Exploring Data
No ratings yet
4 - Exploring Data
32 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
36 pages
WINSEM2022-23 CSI3005 ETH VL2022230503218 ReferenceMaterialI WedMar0100 00 00IST2023 MultivariateDataVisualization PDF
No ratings yet
WINSEM2022-23 CSI3005 ETH VL2022230503218 ReferenceMaterialI WedMar0100 00 00IST2023 MultivariateDataVisualization PDF
56 pages
DM14 Visualisation
100% (1)
DM14 Visualisation
67 pages
Common Visualization Idioms
0% (1)
Common Visualization Idioms
95 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Lec 2
No ratings yet
Lec 2
26 pages
CH 2
No ratings yet
CH 2
68 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Data Mining Notes C3
No ratings yet
Data Mining Notes C3
11 pages
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
No ratings yet
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
52 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
02 Data
No ratings yet
02 Data
47 pages
Data Visulization Techniques
No ratings yet
Data Visulization Techniques
10 pages
BT 3041: Analysis and Interpretation of Biological Data
No ratings yet
BT 3041: Analysis and Interpretation of Biological Data
57 pages
5 knowledge representation
No ratings yet
5 knowledge representation
19 pages
02Know Your Data Lecture2 3
No ratings yet
02Know Your Data Lecture2 3
53 pages
Data Mining: Exploring Data: Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data: Lecture Notes For Chapter 3
21 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
5 Data Exploration
No ratings yet
5 Data Exploration
41 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
Lecture Notes For Data Exploration Chapter Introduction To Data Mining
No ratings yet
Lecture Notes For Data Exploration Chapter Introduction To Data Mining
46 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
The Growing Importance of Soft Skills in The Workplace
No ratings yet
The Growing Importance of Soft Skills in The Workplace
5 pages
Ball Valves Seat Variations: Standard Seats
No ratings yet
Ball Valves Seat Variations: Standard Seats
2 pages
What Is Action Research?: Action Research Is Focused On Solving Specific Classroom or School Problems, Improving
No ratings yet
What Is Action Research?: Action Research Is Focused On Solving Specific Classroom or School Problems, Improving
7 pages
University Thesis and Dissertation
No ratings yet
University Thesis and Dissertation
29 pages
Episode 1 What Lies Ahead Field Study PDF
No ratings yet
Episode 1 What Lies Ahead Field Study PDF
8 pages
Wartsila 14 Product Guide
100% (1)
Wartsila 14 Product Guide
70 pages
3VA24505HK320AA0 Datasheet en
No ratings yet
3VA24505HK320AA0 Datasheet en
6 pages
ESEI
No ratings yet
ESEI
3 pages
Collect Fault Finish
No ratings yet
Collect Fault Finish
119 pages
Xvision 525 240223
No ratings yet
Xvision 525 240223
2 pages
Anglgear Catalog Metrico
100% (1)
Anglgear Catalog Metrico
3 pages
BP Obvius Best Practices
No ratings yet
BP Obvius Best Practices
12 pages
Putnam and Beyond 1st by Razvan Gelca pdf download
No ratings yet
Putnam and Beyond 1st by Razvan Gelca pdf download
37 pages
Swollen hydrogel nanotechnology
No ratings yet
Swollen hydrogel nanotechnology
19 pages
Children and Youth Services Review
100% (1)
Children and Youth Services Review
8 pages
CHE Lab Electrochemical Cells 12th
100% (1)
CHE Lab Electrochemical Cells 12th
6 pages
Alcorcon PIPE Merged Solved
100% (1)
Alcorcon PIPE Merged Solved
80 pages
Design and Analysis of Piston For Two Stages Reciprocating Air Compressor
100% (1)
Design and Analysis of Piston For Two Stages Reciprocating Air Compressor
7 pages
Lecture 22-2
No ratings yet
Lecture 22-2
22 pages
Causation: Torts and Damages
No ratings yet
Causation: Torts and Damages
24 pages
A Unique and Rare Conjunction of Saturn and Ketu
No ratings yet
A Unique and Rare Conjunction of Saturn and Ketu
2 pages
memo-GPP Utilization
No ratings yet
memo-GPP Utilization
1 page
Vma210 Scheme PDF
No ratings yet
Vma210 Scheme PDF
1 page
A Technology Integration Planning
No ratings yet
A Technology Integration Planning
8 pages
Pulse of The PSC-Vol 22 - 2022
No ratings yet
Pulse of The PSC-Vol 22 - 2022
15 pages
2.2.2 Grease Ground Inst
No ratings yet
2.2.2 Grease Ground Inst
10 pages
ModuScreen T2C PARTLIST
No ratings yet
ModuScreen T2C PARTLIST
11 pages
89581_1730717640
No ratings yet
89581_1730717640
1 page
States of Matter
No ratings yet
States of Matter
4 pages
20bm1t01 - Denm
No ratings yet
20bm1t01 - Denm
2 pages

Week 02.1 Chaptr002

Uploaded by

Week 02.1 Chaptr002

Uploaded by

Data Mining:

Concepts and Techniques

Q1=(n+1)/4 Q2=2*[(n+1)/4] Q3=3[*(n+1)/4]

 The inter-quartile range is frequently reduced to the measure

May 2, 2024 Data Mining: Concepts and Techniques 7

 Boxplot: graphic display of five-number summary

Data Mining: Concepts and Techniques 10

 The left half fragment is positively correlated

 Data Objects and Attribute Types

 Basic Statistical Descriptions of Data

 Measuring Data Similarity and Dissimilarity

 Visualization of geometric transformations and projections

 Visualization of the data as perspective landscape

Attr. 1 Attr. 2 Attr. 3 Attr. k

 Visualization of the data values as features of icons

 Visualization of the data using a hierarchical

 Partitioning of the n-dimensional attribute space in 2-D

You might also like

Q1=(n+1)/4 Q2=2[(n+1)/4] Q3=3[(n+1)/4]