0% found this document useful (0 votes)
12 views

1152cs191 Data Visualization Unit III

Uploaded by

Abhinav Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1152cs191 Data Visualization Unit III

Uploaded by

Abhinav Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

School of Computing

Department of Computer Science &


Engineering

1152CS191- Data Visualization


Category : Program Elective
UNIT-III

Course Handling Faculty :


I.Farzhana

12/10/2024 Department of Computer Science & Engineering Data Visualization 1


CO
Nos.
CO
Nos.

CO3
Engineering Knowledge

12/10/2024
Problem Analysis

Design / Development of solutions

Conduct investigations of complex


problems

Modern Tool usage

The Engineer Society


data and Hierarchical structures.

Environment & Sustainability


Course Outcomes

Ethics

Visualization
Individual & Team Work

Department of Computer Science & Engineering


Communication
Discuss the visualization techniques used for Multivariate

Data
Project Management & Finance

Life Long Learning


Course Outcomes

Mathematical Concepts
K2
taxonomy)

Software Development
revised Bloom’s
Level of learning
domain (Based on

Transferring Skills
Correlation of COs with Student Outcomes ABET
EAC and CAC

COs SO1 SO2 SO3 SO4 SO5 SO6 SO7

CO3 3 2 2

COs SO1 SO2 SO3 SO4 SO5 SO6

CO3 2 2

Department of Computer Science & Engineering Data


12/10/2024 3
Visualization
Course Content

UNIT III Visualization Techniques for Multivariate Data 9


Visualization Techniques for Multivariate Data
• Point-Based Techniques
• Line-Based Techniques
• Region-Based Techniques
• Combinations of Techniques

Visualization Techniques for Trees, Graphs, and Networks


• Displaying Hierarchical Structures
• Displaying Arbitrary Graphs/Networks
• Issues.

Department of Computer Science & Engineering Data


12/10/2024 4
Visualization
Types of Data

Univariate Data Bivariate Data Multivariate Data

• Data consists of only • Data involves two • Data involves three or


one variable. The different variables. more variables, it is
analysis of The analysis of this categorized under
univariate data is type of data deals multivariate.
thus the simplest with causes and • Example of this type
form of analysis relationships and the of data is suppose an
since the information analysis is done to advertiser wants to
deals with only one find out the compare the popularity
quantity that relationship among of four advertisements
changes. the two variables. on a website, then
their click rates could
• Example : Height • Example of bivariate
be measured for both
data can be men and women and
12/10/2024 temperature
Department of Computer and ice
Science
Visualization
& Engineering Data
relationships between5
cream sales in
Point Based Techniques
•Point plots are introduced as visualizations that project records from an n-
dimensional data space to an arbitrary k-dimensional display space, such that data
records map to k-dimensional points.

•For each record, a graphical representation, mark, or other aesthetic entity is drawn
at its associated k-dimensional point.

•Individual visualization techniques identified as point plots define appropriate data


projections and specific visual representations.

•Point plots can be defined to display individual records or summary records, and
can be structured by various projection techniques.

Department of Computer Science & Engineering Data


12/10/2024 6
Visualization
Scatterplots
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical
axis indicates values for an individual data point.

Scatter plots’ primary uses are


to observe and show
relationships between two
numeric variables.

The dots in a scatter plot not


only report the values of
individual data points, but also
patterns when the data are
taken as a whole.

Department of Computer Science & Engineering Data


12/10/2024 7
Visualization
Scatterplots

Department of Computer Science & Engineering Data


12/10/2024 8
Visualization
Scatterplots

Dimension subsetting Dimension reduction Dimension


Multiple displays
embedding
Allowing the user to Using techniques such as Showing, either
select a subset of the principal component analysis Mapping dimensions superimposed or
dimensions to display, or or multidimensional scaling juxtaposed, several
to other graphical
to develop algorithms to to transform the high-
attributes besides plots, each of which
find the dimensions dimensional data to data of
lower dimension, to preserve position, such as contains some of
containing the most useful
relationships among the data color, size, and shape. the dimensions.
information for the task at
hand. points.

Department of Computer Science & Engineering Data


12/10/2024 Visualization 9
Scatterplot Matrices

A scatter plot matrix is a grid (or matrix) of scatter plots used to visualize
bivariate relationships between combinations of variables. Each scatter plot in
the matrix visualizes the relationship between a pair of variables, allowing many
relationships to be explored in one chart.

Create a simple matrix of scatter plots

 Minitab Procedure
 Select Graph >> Matrix plot...
 Under Matrix of plots, select the Simple plot.
 In the box labeled Graph variables, specify the variables you want included in
your plot.
 Select OK. A new graph window should appear containing the scatter plot
matrix.
https://online.stat.psu.edu/stat501/lesson/create-simple-matrix-scatter-plots

Department of Computer Science & Engineering Data


12/10/2024 10
Visualization
Scatterplot Matrices

Department of Computer Science & Engineering Data


12/10/2024 11
Visualization
Force-based Methods
Force-Directed Layout algorithms are
graph drawing algorithms based only on
information contained within the
structure of the graph itself rather than
relying on contextual information.

Their purpose is to position the nodes


of a graph in two-dimensional or three-
dimensional space so that all the edges
are of more or less equal length.

There are as few crossing edges as


possible, by assigning forces among the
set of edges and the set of nodes, based
on their relative positions, and then https://vimeo.com/29209664
using these forces either to simulate the
motion of the edges and nodes or to https://youtu.be/q8bY-1Lseh0
minimize their energy.
Department of Computer Science & Engineering Data
12/10/2024 12
Visualization
Multidimensional Scaling
Multidimensional scaling (MDS) is an important class of dimension reduction
algorithms commonly used in statistical analysis and information visualization.

The basic structure of a typical MDS algorithm is as follows:

1. Given a data set with M records and N dimensions, create an M by M matrix,


Ds, that contains similarity measures between each pair of data items. For
example, one might use Euclidean distance as a measure of similarity.

2. Assuming that you are projecting the data into K dimensions (e.g., for display
purposes, K is usually between 1 and 3), create an M by K matrix, L, to contain
the locations for the projected points. These M locations can initially be randomly
chosen, or techniques such as principal component analysis (PCA) can be used to
create reasonable initial positions.

3. Compute an M byM matrix, Ls, that contains the similarities between all pairs
of points in L.
Department of Computer Science & Engineering Data
12/10/2024 13
Visualization
Multidimensional Scaling
4. Compute the value of stress, S, which is a measure of the differences between
Ds and Ls. Many such stress measures exist; most assume that the coordinate
systems have been normalized so that the maximum distance between points is
1.0.

5. If S is sufficiently small, or hasn’t changed significantly in recent iterations, the


algorithm terminates.

6. Otherwise, attempt to shift the positions of points in L in a direction that will


reduce their individual stress levels.
For example, this might be a weighted sum of displacements based on
comparing the point with all other points, or perhaps only with its nearest
neighbors. The displacement should be scaled such that points don’t oscillate
between positions.

7. Return to step 3.
Department of Computer Science & Engineering Data
12/10/2024 14
Visualization
Multidimensional Scaling

Iris data set


projected using
MDS.

Department of Computer Science & Engineering Data


12/10/2024 15
Visualization
RadViz
RadViz is a force-driven point layout technique that is based on Hooke’s Law for
equilibrium.
For an N-dimensional data set, N anchor points are placed on the circumference
of the circle to represent the fixed ends of the N springs attached to each data
point.
To simplify computations and provide an intuitive feel for the algorithm, these
anchors are most commonly placed on a circle of radius 1.0 centered on the
origin.

12/10/2024 Department of Computer Science & Engineering Data Visualization 16


RadViz
RadViz is a multivariate data visualization algorithm that plots each feature
dimension uniformly around the circumference of a circle then plots points on the
interior of the circle such that the point normalizes its values on the axes from the
center to each arc.

Department of Computer Science & Engineering Data


12/10/2024 17
Visualization
Class-discrimination Algorithm
 The below figure shows a class-discrimination algorithm that selects the
dimensions providing the most spread in the data in a RadViz display.
 By placing the genes expressed in acute lymphoblastic leukemia (ALL)
patients close to each other and the genes expressed in acute myeloblastic
leukemia (AML) close to each other, the AML patients were separated from
the ALL patients.

12/10/2024 Department of Computer Science & Engineering Data Visualization 18


Vectorized RadViz
Vectorized RadViz, or VRV, constructs multiple dimensions from individual
dimensions by a flattening process, breaking each dimension into many.

For example, the dimension representing the number of cylinders can be broken
down into 5 new dimensions: having 1 or 2 cylinders, having 3 or 4 cylinders,
having 5 or 6, having 7, or having 8. The number of new dimensions can be
determined algorithmically or manually.

This is similar to identifying bins in data (such as the grouping of low, medium,
and high for prices of cars).

Each original dimension is thus represented by a vector of new dimensions, with


each new coordinate in that vector having the value 0 or 1, namely, whether the
record has the value corresponding to that dimension or not.

Thus, for each record, each new vector of dimensions has exactly one dimension
with the value 1, and all the others have value zero.
Department of Computer Science & Engineering Data
12/10/2024 19
Visualization
Vectorized RadViz

Vectorized RadViz, formed by splitting each dimension into multiple dimensions


to create a binary representation for each data record. In this case, each cluster set
is separated into multiple dimensions, where each dimension represents a cluster
in each cluster set
Department of Computer Science & Engineering Data
12/10/2024 20
Visualization
Line-Based Techniques
•Line-based methods, points corresponding to a particular record or
dimension are linked together with straight or curved lines.

• These lines not only reinforce the


relationships among the data values,
convey perceivable features of the data via
slopes, curvature, crossings, and other line
patterns.

•Line Graphs
• A line graph is a univariate
visualization technique where the
vertical axis represents the range of
values for the variable and the
horizontal axis represents some
ordering of the records in the data set.

Department of Computer Science & Engineering Data


12/10/2024 21
Visualization
Line Graph

Four versions of line graphs for a subset of the AAUP data set: superimposed, stacked, ordered superimposed,
and ordered stacked. Ordering is based on the first dimension, which represents salaries of full professor.
Department of Computer Science & Engineering Data
12/10/2024 22
Visualization
Parallel Coordinates
Parallel coordinates, also called ||-coords and PCP (for parallel coordinates
plot), were first introduced by Inselberg in 1985 as a mechanism for
studying high-dimensional geometry

Example of a 7-dimensional data set


visualized with parallel coordinates. A
single data point is represented as the
darkened polyline.

Department of Computer Science & Engineering Data


12/10/2024 23
Visualization
Parallel Coordinates
Capabilities of parallel coordinates
• Hierarchical parallel coordinates that show data clusters rather than the
original data using semi-transparent lines to reveal clusters in large data
sets

• Clustering, reordering, and spacing of axes based on correlation

• Reordering axes to reduce visual clutter

• Grouping data into cluster bands with special treatment of outliers

• Incorporating histograms into the axes to better convey univariate


distributions

• Fitting curves to the intersection points to better convey continuity


12/10/2024 across axes Department of Computer Science & Engineering Data
24
Visualization
Parallel Coordinates

Set of points is selected in the parallel


coordinates plot. Selected points are
colored dark red, and the subspace
containing them is shown in grey.

Department of Computer Science & Engineering Data


12/10/2024 25
Visualization
Andrews Curves
Line-based visualization for multivariate data is the Andrews curve, developed
by David F. Andrews in 1972. Each multivariate data point D = (d1, d2, . . . ,
dN) is used to create a curve of the form

Department of Computer Science & Engineering Data


12/10/2024 26
Visualization
Radial Axis Techniques
Circular line graph is one in which the plotted lines are offset from a circular
base A long graph can be nested by dividing it up into equal size segments and
mapping each to a base of different radius.

An example of a circular line graph.


(Image courtesy
http://www.cemframework
.com/img/PolarPlot1.png.)

Department of Computer Science & Engineering Data


12/10/2024 27
Visualization
Radial Axis Techniques
Variants on circular line graphs include radar and star graphs.

• Polar graphs—point plots using polar coordinates;

• Circular bar charts—like circular line graphs, but plotting bars on


the base line;

• Circular area graphs—like a line graph, but with the area under
line filled in with a color or texture;

• Circular bar graphs—with bars that are circular arcs with a


common center point and base line (note the difference between
these and circular bar charts: in one, the bar is straight and the base
is curved, but vice versa for the other).
Department of Computer Science & Engineering Data
12/10/2024 28
Visualization
Radial Axis Techniques

An example of a spiral layout


for a bar graph generated by
SpiralGlyphics

Department of Computer Science & Engineering Data


12/10/2024 29
Visualization
Region-Based Techniques

In region-based techniques, filled polygons are used to convey values, based


on their size, shape, color, or other attributes.

Bar Charts/Histograms

Examples of 2D bar graphs for showing multivariate data.

Department of Computer Science & Engineering Data


12/10/2024 30
Visualization
Region-Based Techniques

Examples of 3D visualizations for showing multivariate data.

a) Bar graphs. (b) Cityscape.

b) Examples of 3D visualizations for showing multivariate data.

Department of Computer Science & Engineering Data


12/10/2024 31
Visualization
Tabular Displays

Department of Computer Science & Engineering Data


12/10/2024 32
Visualization
Tabular Displays

Department of Computer Science & Engineering Data


12/10/2024 33
Visualization
Dimensional Stacking
 Dimensional stacking is a method developed by LeBlanc, for mapping data from a
discrete N-dimensional space to a two-dimensional image in a manner that
minimizes the occlusion of data, while preserving much of the spatial information.

 The mapping is performed as follows: begin with data of dimension 2N + 1 (for an


even number of dimensions there would be an additional implicit dimension of
cardinality one).

 Select a finite cardinality/discretization for each dimension.

 Choose one of the dimensions to be the dependent variable.

 The rest will be considered independent.

12/10/2024 Department of Computer Science & Engineering Data Visualization 34


Dimensional Stacking

Department of Computer Science & Engineering Data


12/10/2024 35
Visualization
Combinations of Techniques
Glyphs and Icons

 one-to-one mappings, where each data attribute maps to a distinct and


different graphical attribute;
 one-to-many mappings, where redundant mappings are used to improve the
accuracy and ease with which a user can interpret data values; and
 many-to-one mappings, where several or all data attributes map to a common
type of graphical attribute, separated in space, orientation, or other
transformation.

Examples of multivariate
glyphs

Department of Computer Science & Engineering Data


12/10/2024 36
Visualization
Visualization Techniques for
Trees, Graphs, and Networks

 Trees or hierarchies are one of the


most common structures to hold
relational information.

 For this reason, in any visualization


techniques have been developed for
display of such information.

 We can divide these techniques into


two classes of algorithms: space-
filling and non-space-filling.

Department of Computer Science & Engineering Data


12/10/2024 37
Visualization
Tree Representations

Department of Computer Science & Engineering Data


12/10/2024 38
Visualization
Visualization Techniques for
Trees, Graphs, and Networks

Department of Computer Science & Engineering Data


12/10/2024 39
Visualization
Visualization Techniques for
Trees, Graphs, and Networks

Department of Computer Science & Engineering Data


12/10/2024 40
Visualization
Visualization Techniques for
Trees, Graphs, and Networks

 Trees or hierarchies are one of the


most common structures to hold
relational information.

 For this reason, in any visualization


techniques have been developed for
display of such information.

 We can divide these techniques into


two classes of algorithms: space-
filling and non-space-filling.

Department of Computer Science & Engineering Data


12/10/2024 41
Visualization
Space Filling Method

 Space-filling techniques make maximal use of the display space.

 This is accomplished by using juxtapositioning to imply relations, as opposed


to, for example, conveying relations with edges joining data objects.

 The two most common approaches to generating space-filling hierarchies are


rectangular and radial layouts.

 Treemaps and their many variants are the most popular form of rectangular
space-filling layout.

 In the basic treemap, a rectangle is recursively divided into slices, alternating


horizontal and vertical slicing, based on the populations of the subtrees at a
given level

Department of Computer Science & Engineering Data


12/10/2024 42
Visualization
Space Filling Method

Pseudocode for
drawing a
hierarchy using a
treemap

Department of Computer Science & Engineering Data


12/10/2024 43
Visualization
Tree Map Display

12/10/2024 Department of Computer Science & Engineering Data Visualization 44


Pseudocode for drawing a hierarchy using a
sunburst display

Department of Computer Science & Engineering Data


12/10/2024 45
Visualization
Sunburst display

12/10/2024 Department of Computer Science & Engineering Data Visualization 46


Non Space Filling Method
 The most common representation used to visualize tree or hierarchical
relationships is a node-link diagram.

 Organizational charts, family trees, and tournament pairings are just


some of the common applications for such diagrams.

 The drawing of such trees is influenced the most by two factors: the
fan-out degree (e.g., the number of siblings a parent node can have)
and the depth (e.g., the furthest node from the root).

 Trees that are significantly constrained in one or both of these aspects,


such as a binary tree or a tree with only three or four levels, tend to be
much easier to draw than those with fewer constraints.

Department of Computer Science & Engineering Data


12/10/2024 47
Visualization
Non Space Filling Method
When designing an algorithm for drawing any node-link diagram (not just
trees), one must consider three categories of often-contradictory
guidelines:

 Drawing conventions, constraints, and aesthetics.

 Conventions may include restricting edges to be either a single straight


line, a series of rectilinear lines, polygonal lines, or curves.

 Other conventions might be to place nodes on a fixed grid, or to have


all sibling nodes share the same vertical position.

Department of Computer Science & Engineering Data


12/10/2024 48
Visualization
Non Space Filling Method

Aesthetics, however, often have significant impact on the interpretability of a tree


or graph drawing, yet often result in conflicting guidelines. Some typical aesthetic
rules include:
 minimize line crossings
 maintain a pleasing aspect ratio
 minimize the total area of the drawing
 minimize the total length of the edges
 minimize the number of bends in the edges
 minimize the number of distinct angles
or curvatures used
 strive for a symmetric structure

Department of Computer Science & Engineering Data


12/10/2024 49
Visualization
Non Space Filling Method

For trees, especially balanced ones, it is relatively easy to design algorithms


that adhere to many, if not most, of these guidelines. For example,
a simple tree drawing procedure is given below

1. Slice the drawing area into equal-height slabs, based on the depth of
the tree.
2. For each level of the tree, determine how many nodes need to be drawn.
3. Divide each slice into equal-sized rectangles based on the number of nodes
at that level.
4. Draw each node in the center of its corresponding rectangle.
5. Draw a link between the center-bottom of each node to the center-top
of its child node(s).
Many enhancements can be made to this rather basic algorithm in order
to improve space utilization and move child nodes closer to their parents.

Department of Computer Science & Engineering Data


12/10/2024 50
Visualization
Improving Space Utilization
 A Rather than using even spacing and centering, divide each level based on
the number of terminal nodes belonging to each subtree.

 A Spread terminal nodes evenly across the drawing area and center parent
nodes above them.

 A Add some buffer space between adjacent nonsibling nodes to emphasize


relationships.

 If possible, reorder the subtrees of a node to achieve more symmetry and


balance.

 A Position the root node in the center of the display and lay out child Nodes
radially, rather than vertically.

Department of Computer Science & Engineering Data


12/10/2024 51
Visualization
Cone Tree Display

Department of Computer Science & Engineering Data


12/10/2024 52
Visualization
Displaying Arbitrary Graphs/Networks
 A tree is a connected, unweighted, acyclic graph.

 There are many other possibilities, including graphs with weighted edges,
undirected graphs, graphs with cycles, disconnected graphs, and so on.

 Graph is undirected, though some of the techniques presented are easily


extended to directed graphs.

 Two distinct graph drawing approaches: node-link diagrams (building on the


material from the previous section) and matrix displays.

Department of Computer Science & Engineering Data


12/10/2024 53
Visualization
Displaying Arbitrary Graphs/Networks

• A face is a partition of the plane isolated by a set of connected vertices.


• A neighbor set is a counter-clockwise listing of the vertices incident to a
particular vertex.
• A planar embedding is a class of planar graph drawings with the same
neighbor sets for each vertex. A planar graph can have an exponential
number of such embeddings.
• A cutvertex is any node that causes the graph to be disconnected if it is
removed.
• A biconnected graph is one without a cutvertex.
• A block is a maximally biconnected subgraph of a graph.
• A separating pair means two vertices whose removal causes a
biconnected graph to become disconnected.
• A triconnected graph is one without a separating pair. A planar
triconnected graph has a unique embedding.
Department of Computer Science & Engineering Data
12/10/2024 54
Visualization
Biconnected Graph

• Given a biconnected graph G and a separating cycle C:


1. Compute all the pieces of G with respect to C.
2. For each piece P that is not a simple path (e.g., that contains a cycle).
(a) Create graph G consisting of P plus C.
(b) Create cycle C consisting of a path through P plus the section of
C joining the ends.
(c) Apply the algorithm to (G, C). If the result is nonplanar, G is Nonplanar.

Department of Computer Science & Engineering Data


12/10/2024 55
Visualization
Biconnected Graph

3. Compute the interlacement graph I of the pieces of G.

4. If I is not bipartite, G is nonplanar; else G is planar.

If a graph is nonplanar, we can make it planar using the following


strategy:
1. Determine the largest planar subgraph of the graph.
2. For the remaining vertices, place each within a face that
minimizes the number of edge crossings.
3. For each edge crossing, break the edges into two parts each,
and connect the broken ends to a new dummy vertex.

Department of Computer Science & Engineering Data


12/10/2024 56
Visualization
Biconnected Graph

Department of Computer Science & Engineering Data


12/10/2024 57
Visualization
Biconnected Graph

Six different ways for networks to display complexity:


• structural complexity (edges are tangled),
• network evolution (the network evolves over time),
• connection diversity (weights/directions/signs of edges),
• dynamical complexity (node states can vary with time),
• node diversity (different types of nodes),
• meta-complication.

Department of Computer Science & Engineering Data


12/10/2024 58
Visualization
Matrix Representation

Department of Computer Science & Engineering Data


12/10/2024 59
Visualization

You might also like