0% found this document useful (0 votes)
19 views

1152cs191 Data Visualization Unit i

Uploaded by

Abhinav Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

1152cs191 Data Visualization Unit i

Uploaded by

Abhinav Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 129

School of Computing

Department of Computer Science &


Engineering

1152CS191- Data Visualization


Category : Program Elective

Course Handling Faculty :


I.Farzhana
Assistant Professor
12/10/2024 Department of Computer Science & Engineering DATA
VISUALIZATION
Course Outcomes
Level of learning
CO domain (Based on
Course Outcomes
Nos. revised Bloom’s
taxonomy)
Identify the type of data and able to explain the
CO1 K2
visualization process
Illustrate the visualization techniques used for
CO2 K2
Spatial and Geospatial data
Discuss the visualization techniques used for
CO3 K2
Multivariate data and Hierarchical structures.
Identify the visualization techniques for Text and
CO4 K2
documents
Explore different visualization tools for various
CO5 K2
applications

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
CO
Nos.

CO4
CO3
CO2
CO1
Engineering Knowledge

3
3
3
3

12/10/2024
Problem Analysis

2
2
Design / Development of solutions

2
2
Conduct investigations of complex

2
2
problems

Modern Tool usage

The Engineer Society

Environment & Sustainability

2
Ethics

VISUALIZATION
Individual & Team Work

Communication

Department of Computer Science & Engineering


DATA Project Management & Finance

Life Long Learning


2
2

Mathematical Concepts
3
3
and Program Specific Outcomes

Software Development
2

Transferring Skills
Correlation of COs with Program Outcomes
Correlation of COs with Student Outcomes
ABET EAC

COs SO1 SO2 SO3 SO4 SO5 SO6 SO7

CO1 3 2 2

CO2 3 2 2

CO3 3 2 2

CO4 3 2 2

CO5 2 2 2 2

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Correlation of COs with Student Outcomes
ABET CAC

COs SO1 SO2 SO3 SO4 SO5 SO6

CO1 2 2 2

CO2 2 2 2

CO3 2 2

CO4 2 2

CO5 3 2 2 2

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Course Content
UNIT I Introduction of Visualization 9
Relationship between Visualization and Other Fields - The Visualization Process - The Scatter
plot - The Role of-the User- Types of Data - Structure within and between Records - Data Pre
processing - Data Sets - Human Perception and Information Processing – Perception –
Physiology- Perceptual Processing - Perception in Visualization- Metrics - Visualization
Foundations.

UNIT II Visualization Techniques for Spatial Data and Geospatial Data 9


Visualization Techniques for Spatial Data - One-Dimensional Data - Two-Dimensional Data -
Three-Dimensional Data - Dynamic Data. Visualization Techniques for Geospatial Data -
Visualizing Spatial Data - Visualization of Point Data -Visualization of Line Data - Visualization
of Area Data - Issues in Geospatial Data Visualization.

UNIT III Visualization Techniques for Multivariate Data 9


Visualization Techniques for Multivariate Data - Point-Based Techniques - Line-Based
Techniques - Region-Based Techniques - Combinations of Techniques - Visualization
Techniques for Trees, Graphs, and Networks - Displaying Hierarchical Structures - Displaying
Arbitrary Graphs/Networks - Issues.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Course Content
UNIT IV Text and Document Visualization 9
Levels of Text Representations - The Vector Space Model - Single Document
Visualizations - Document Collection Visualizations - Extended Text
Visualizations - Designing Effective Visualizations - Steps in Designing
Visualizations - Problems - Comparing and Evaluating Visualization Techniques.

UNIT V Data Visualization Tools 9


Trends in Data Visualization and Other Tools - Tableau - Data Wrangler, Python,
D3.js, R and Shiny - Visualization for Genetic Network Reconstruction -
Reconstruction, Visualization and Analysis of Medical Images - Exploratory
Graphics of a Financial Dataset - Graphical Data Representation in Bankruptcy
Analysis - Visualization Tools for Insurance Risk Processes

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Text Books
1. Matthew Ward, Georges Grinstein,
Daniel Keim, “Interactive Data
Visualization Foundations,
Techniques, and Applications”, 2nd
Edition, A K Peters, Ltd.Natick,
Massachusetts, 2015. (UNIT-I-IV)

2. Chun-houh Chen, Wolfgang


Hardle, Antony Unwin,
“Handbook of Data Visualization”,
2nd Edition, Springer, 2016.(UNIT-
V)

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Reference Books

1. Donabel Santos, “Tableau 10 Business


Intelligence Cookbook”, Packt
Publishing, 1786465639,
9781786465634, 2016.

Learning Resources:
1. https://rstudio.github.io/r2d3/
2. https://www.analyticsvidhya.com/blog/2016/10/creating-interactive-
data-visualization-using-shiny-app-in-r-with-examples/
3. https://medium.com/opex-analytics/when-what-data-viz-part-ii-b4f76
f5d0a29

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Visualization

Visualization
Process
How to
visualize
What can be them?
Visualized The visualization
process (and the
Visualization Types/features result) may
Visible reality: person,
animal, building,
of visualization change the
Visualization mountain forms original form of
– Hidden reality: earth – 2D vs. 3D things (or create a
Process of core, blood vessel,
Related to vision – Static vs. new form) for
forming a visual universe
(seeing through – Invisible reality: motion better
image of things
eyes), one of the
wind, air, heat, electron, – Virtual vs. understanding
that can be sound, smell, magnetic
major materialized and
seen through eyes fields
human senses to – Abstract entity: data, – Realistic vs. communication.
(or imagined in
interact with the
information, idea, abstract
human mind). hierarchy, process,
world relationship

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
A Visualization of “Visualization”

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Traditional “Visualization”

Dr. John Snow’s Cholera


Map of London (1854)
Graphical Display –Large Datasets

Total cancer deaths, 1950-1969 (top:


white women; bottom: white men)
12/10/2024 Department of Computer Science & Engineering DATA VISUALIZATION
Visualization in Every day Life

A graph of stock market


activities that might Visualization An instruction manual
indicate an upswing (or for putting together a
downturn) in the bicycle, with views
economy specific to each part as
it is added;
A plot comparing the
effectiveness of your
pain killer to that of the
leading brand
A highway sign
indicating a curve,
A 3D reconstruction of merging of lanes, or an
your injured knee, as intersection.
generated from a CT
scan;.
Visualization in Areas of Employment

The result of a financial and A comet path data and trend


stock market analysis. analysis

Mechanical and civil engineering Visualization The analysis of a simulation


rotary bridge design and systems of a physical system
analysis.

A breast cancer MRI for The study of actuarial data for


diagnosis and therapy. confirming and guiding
quantitative
analysis
Image Processing, Computer Graphics and
Visualization
Image Processing, Computer Graphics and
Visualization
• Computer graphics outputs an image.
• Visualization may employ graphics to
generate images.
• Visualization may employ image
processing to study images.
• Visualization – usually works with 3D
or n-D data, for n >= 3 – employs data
transformation to enhance meaning of
the data – is usually interactive and
required human intervention .
Types of Visualization
• Column Chart • Pie Chart
• Bar Graph • Waterfall Chart
• Stacked Bar Graph • Bubble Chart
• Stacked Column Chart • Scatter Plot Chart
• Area Chart • Bullet Graph
• Dual Axis Chart • Funnel Chart
• Line Graph • Heat Map
• Mekko Chart
Scientific and Information Visualization

•Transformation of data or information into pictures (visual


outputs)
•Note this does not necessarily imply the use of computers
•Classical visualization used hand-drawn figures and illustrations
(2D means for visualization)
•Modern visualization is primarily 3D (digital images for 3D
visualization)
•In both cases, the ultimate goal is to understand important
insights about the data through visual means
•We really don’t care how we get the picture in visualization –
what picture we get is most important
•The technical ways to arrive at visual outputs are mainly
depending on computer graphics techniques
Why is Visualization Important
Importance of Data Visualization
•Leading the target
•Helping decision makers audience to focus on
understand how the business insights to
business data is being discover areas that
interpreted to determine require attention.
business decisions.

•Revealing previously •Handling large


unnoticed key points about amounts of data in a
the data sources to help pictorial format to
decision makers compose provide a summary of
data analysis reports. unseen patterns in the
data, revealing insights
•Visualizing business data to and the story behind
the data to establish a
manage growth and
business goal.
converting trends into
business strategies by making
sense of your information.
Benefits of Data Visualization

Better analysis Quick action Finding Errors

The human brain grasps visuals more Helps quickly identify any
Helps business stakeholders analyze
easily than table reports. Data errors in the data. If the data
reports regarding sales, marketing
visualizations allow decision makers to tends to suggest the wrong
strategies, and product interest. Based on
be notified quickly of new data insights actions, visualizations help
the analysis, they can focus on the areas
and take necessary actions for business identify erroneous data sooner
that require attention to increase profits,
growth. so that it can be removed
which in turn makes the business more
from analysis.
productive.

Exploring business Understanding


Identifying Patterns insights the story
Large amounts of complicated data can
Finding data correlations using visual Help the target audience
provide many opportunities for insights
representations is key to identifying grasp the story in a single
when we visualize them. Visualization
business insights. Exploring these glance. Always be sure to
allows business users to recognize
insights is important for business users convey the story in the
relationships between the data, providing
or executives to set the right path to simplest way, without
greater meaning to it. Exploring these
achieving the business’ goals.. excessive complicated
patterns helps users focus on specific
visuals.
areas that require attention in the data, so
that they can identify the significance of
those areas to drive their business
forward.
A Visualization Process

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Modeling. Viewing. Clipping.
• A three-dimensional Computer Graphics
• A virtual camera is • By Pipeline
specifying the
model, consisting of defined at a location bounds of the desired
planar polygons in world image (usually given
defined by vertices coordinates, along by corner positions on
and surface with a direction and a plane of projection
properties, is orientation placed in front of the
generated using a (generally given as camera), objects out of
view can be removed,
world coordinate vectors). All vertices
and those that are
system. are transformed into
partially visible can be
a viewing coordinate
clipped.
system based on the
camera parameters.
Computer Graphics Pipeline
Hidden surface removal. Projection. Rendering.

• Polygons • Three-dimensional • The actual color of the


facing
polygons are projected pixels associated with a
away from the visible polygon depends on
camera, or those onto the
a number of factors,
twodimensional
obscured by others, including the material
• plane of projection, properties being
are removed or
usually using a synthesized the type(s),
clipped. This perspective location(s), color, and
process may be transformation. intensity of the light
integrated into the • The results may be in a source(s), the degree of
projection process. occlusion from direct light
normalized 2D exposure, and the amount
coordinate system or and color of light being
• device/screen reflected off of other
coordinates. objects onto the polygon.
Visualization Pipeline

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Visualization Pipeline
Rendering or generation
Data modeling. of the visualization.
The data to be visualized, Data to Visual mappings
whether from a file or a The heart of the visualization The specific projection or
database, has to be pipeline is performing the rendering of the visualization
structured to facilitate its mapping of data values to objects varies according to the
visualization.. graphical entities or their mapping being used; shading
attributes. or texture mapping is involved

Data Selection Scene parameter setting (view


Data selection involves transformations).
identifying the subset of
The user must specify several
the data that will be
attributes of the visualization that are
potentially visualized.
relatively independent of the data.

Department of Computer Science & Engineering DATA


VISUALIZATION
Knowledge Discovery Pipeline

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Proposed Visualization Model

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception

How many legs does this elephant


have?
(Image from http://www.ilusa.com/
gallery/elephant-illusion.jpg.)

The strength of the eye’s saccadic


movement is hard to overcome

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception

A display showing one distractor


A display where orientation is the key
(red) in a sea of blue-colored
perceptual factor explored.
points. It is preattentively
distinguished.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Role of Cognition

•Process of acquiring knowledge and understanding through thought, experience, and the
senses.
• What does the user see in a visualization?
• What information gets understood, missed, remembered?
• For how long can such information be remembered?
• Each of these questionsDepartment
requires us to look beyond perception and into cognition.
of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Pseudocode Conventions
 data—The working data table. This data table is assumed to contain only
numeric values. In practice, dimensions of the original data table that contain
non-numeric values must be somehow converted to numeric values.

 m—The number of dimensions (columns) in the working data table.


Dimensions are typically iterated over using j as the running dimension
index.

 Normalize(record, dimension), Normalize(record, dimension, min, max)—A


function that maps the value for the given record and dimension in the
working data table to a value between min and max, or between zero and
one if min and max are not specified.

 Color(color)—A function that sets the color state of the graphics


environment to the specified color (whose type is assumed to be an integer
containing RGB values).
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Pseudocode Conventions
 MapColor(record, dimension)—A function that sets the color state of the
graphics environment to be the color derived from applying the global color
map to the normalized value of the given record and dimension in the
working data table.

 Circle(x, y, radius)—A function that fills a circle centered at the given (x, y)-
location, with the given radius, with the color of the graphics environment.
The plotting space for all visualizations is the unit square. In practice, this
function must map the unit square to a square in pixel coordinates.

 Polyline(xs, ys)—A function that draws a polyline (many connected line


segments) from the given arrays of x- and y-coordinates.

 Polygon(xs, ys)—A function that fills the polygon defined by the given arrays
of x- and y-coordinates with the color of the current color state.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Pseudocode Conventions
For geographic visualizations, the following functions are assumed to exist in
the environment:

GetLatitudes(record), GetLongitudes(record)—Functions that retrieve the


arrays of latitude and longitude coordinates, respectively, of the geographic
polygon associated with the given record. For example, these polygons could
be outlines of the countries of the world.

ProjectLatitudes(lats, scale), ProjectLongitudes(longs, scale) —Functions


that project arrays of latitude values to arrays of y values, and arrays of
longitude values to arrays of x values, respectively.

For graph and 3D surface data sets, the following is provided:

GetConnections(record)—A function that retrieves an array of record indices


to which the given record is connected.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Scatterplot
 The scatterplot is one of the earliest and most widely used visualizations
developed. It is based on the Cartesian coordinate system.

 The following pseudocode renders a scatterplot of circles.

 Records are represented in the scatterplot as circles of varying location,


color, and size.

 The x- and y-axes represent data from dimension numbers xDim and
yDim, respectively.

 The color of the circles is derived from dimension numbercDim.

 The radius of the circles is derived from dimension number rDim, as well
as from the upper and lower bounds for the radius, rMin and rMax.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Scatterplot

Scatterplot(xDim, yDim, cDim, rDim, rMin, rMax)


1 for each record i For each record,
2 do x ← Normalize(i, xDim) derive the
location,
3 y ← Normalize(i, yDim)
4 r ← Normalize(i, rDim, rMin, rMax) radius,
5 MapColor(i, cDim) and color, then
6 Circle(x, y, r) draw the record as a circle.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Scatterplot

A scatterplot of horsepower versus city MPG for Toyota vehicles.


The vehicle class is mapped to color.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
The Role of the User
The role of the visualization can have significant impact on the types of user
involvement. It is useful to categorize visualizations based on the purpose they
serve. These include the following:

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Exploration

1. Do the trends defined for 28 records


apply to the whole data set, or if not,
what specific subsets do they apply to?

2. How many records have missing


values, and for what fields (attributes)?

3. What can we say about the missing


data?

4. What can we say about the data


overall?

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Confirmation

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Interactive Presentation

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Types of Data

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Types of Data

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Nominal
Nominal: nominal is from the Latin nomalis, which means “pertaining to
names”. It’s another name for a category.

Examples:
•Gender: Male, Female, Other.
•Hair Color: Brown, Black, Blonde, Red,
Other.
•Type of living accommodation: House,
Apartment, Trailer, Other.
•Genotype: Bb, bb, BB, bB.
•Religious preference: Buddhist, Mormon,
Muslim, Jewish, Christian, Other.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Ordinal
Ordinal: means in order. Includes “First,” “second” and “ninety ninth.”

Examples:
•High school class ranking: 1st, 9th, 87th…
•Socioeconomic status: poor, middle class,
rich.
•The Likert Scale: strongly disagree, disagree,
neutral, agree, strongly agree.
•Level of Agreement: yes, maybe, no.
•Time of Day: dawn, morning, noon,
afternoon, evening, night.
•Political Orientation: left, center, right.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Interval
Interval: has values of equal intervals that mean something. For example,
a thermometer might have intervals of ten degrees.

Examples:
•Celsius Temperature.
•Fahrenheit Temperature.
•IQ (intelligence scale).
•SAT scores.
•Time on a clock with hands.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Ratio
Ratio: exactly the same as the interval scale except that the zero on the
scale means: does not exist. For example, a weight of zero doesn’t exist;
an age of zero doesn’t exist. On the other hand, temperature (with the
exception of Kelvin) is not a ratio scale, because zero exists (i.e. zero on
the Celsius scale is just the freezing point; it doesn’t mean that water
ceases to exist).

Examples:
Age.
•Weight.
•Height.
•Sales Figures.
•Ruler measurements.
•Income earned in a week.
•Years of education.
•Number of children.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Structure within and between Records

Data sets have structure, both in terms of the means of representation (syntax), and
the types of interrelationships within a given record and between records (semantics).

Scalars, Vectors, and Tensors


•An individual number in a data record is often referred to as a scalar. Scalar
values, such as the cost of an item or the age of an individual, are often the focus for
analysis and visualization.

•Vectors are found in typical data sets include position (2 or 3 spatial values), color
(a triplet of red, green, and blue components), and phone number. While each
component of a vector might be examined individually, it is most common to treat
the vector as a whole.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Structure within and between Records

Tensor
•Scalars and vectors are simple variants on a more
general structure known as a tensor.
•A tensor is defined by its rank and by the
dimensionality of the space within which it is
defined.
• A scalar can be designated a tensor of rank zero.
• A vector can be designated a tensor of rank one.
• A rank-three tensor is represented with a cubic
second-order Cauchy
matrix, with components coming out of your stress tensor
computer
12/10/2024 screen.
Department of Computer Science & Engineering
VISUALIZATION
DATA
Geometry and Grids
Incorporating geometric structure in a data set
is to have explicit coordinates for each data
record.
In modeling of 3D objects, the geometry
constitutes the majority of the data, with
coordinates given for each vertex.

Data set of temperature readings from across the


country might include the longitude and latitude
associated with the sensors, as well as the sensor
values.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Structured Data
MRI (magnetic resonance imagery).
Density (scalar), with three spatial
attributes, 3D grid connectivity;

CFD (computational fluid dynamics).


Three dimensions for displacement, with
one temporal and three spatial attributes,
3D grid connectivity (uniform or
nonuniform);

Financial. No geometric structure, n


possibly independent components,
nominal and ordinal, with a temporal
attribute;
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Structured Data

CAD (computer-aided design). Three spatial


attributes with edge and polygon connections,
and surface properties;

Remote sensing. Multiple channels, with two or


three spatial attributes, one temporal attribute,
and grid connectivity;

Census. Multiple fields of all types, spatial


attributes (e.g., addresses), temporal attribute,
and connectivity implied by similarities in
fields;
Social Network. Nodes consisting of multiple
fields of all types, with various
connectivity attributes that could be spatial,
temporal, or dependent
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Data Preprocessing
It is a data mining technique that involves transforming raw data into an
understandable format.
•Data in the real world is:
– Incomplete: lacking values, certain attributes of interest, etc.
– Noisy: containing errors or outliers
– Inconsistent: lack of compatibility or similarity between two or
more facts.
• No quality data, no quality mining results!
– Quality decisions must be based on quality data
– Data warehouse needs consistent integration of quality data
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Data Preprocessing

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Cleaning

Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty,
human or computer error, transmission error
incomplete: lacking attribute values, lacking certain attributes of interest, or
containing only aggregate data
e.g., Occupation=“ ” (missing data)
noisy: containing noise, errors, or outliers
e.g., Salary=“−10” (an error)
inconsistent: containing discrepancies in codes or names, e.g.,
Age=“42”, Birthday=“03/07/2010”
Was rating “1, 2, 3”, now rating “A, B, C”
discrepancy between duplicate records
Intentional (e.g., disguised missing data) Jan. 1 as everyone’s birthday?
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Segmentation

The data can be separated into contiguous regions, where each region corresponds
to a particular classification of the data.

A typical problem with segmentation is that


the results may not coincide with regions
that are semantically homogeneous
(undersegmented), or may consist of large
numbers of tiny regions (oversegmented).
One solution to this problem is to follow the
initial segmentation process with an iterative
split-and-merge refinement stage.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Segmentation
similarThresh = similarity measure indicating two regions have similar
characteristics
homogeneousThresh = uniformity measure indicating a region is too
nonhomogeneous
do {
changeCount = 0
for each region
compare region with neighboring
regions to find most similar
if most similar neighbor is within similarThresh of current region
merge two regions
changeCount++
evaluate homogeneity of region
if region homogeneity is less than homogeneousThresh
split region into two
changeCount++
} until changeCount == 0

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Sets

1. djia-100.xls. A univariate, nonspatial data set


consisting of 100+ years of daily Dow Jones
Industrial Averages.
Source—http://www.analyzeindices.com/dow-jones-
history.shtml
Format—Excel spreadsheet. After the header, each
entry is of the form YYMMDD followed by the
closing value.
Code—file can be viewed with Excel.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Sets
https://serc.carleton.edu/NAGTWorkshops/visualize04/tool_examples/opendx.html

2. colorado elev.vit. A two-


dimensional, uniform grid, scalar field
representing the elevation of a square
region of Colorado.
Source—included with the distribution
of OpenDX (http://www.opendx .org/).
Format—binary file with a 268-byte
header followed by a 400 × 400 array of
1-byte elevations.
Code—file can be rendered with
TopoSurface, a Processing program

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Sets

3. uvw.dat. A three-dimensional uniform grid vector field representing a


simulated flow field. The data shows one frame of the unsteady velocity field in a
turbulent channel flow, computed by a finite volume method.
The streamwise velocity (u) is much larger than the secondary velocities in the
transverse direction (v and w).
Source—Data courtesy of Drs. Jiacai Lu and Gretar Tryggvason, ME Department,
Worcester Polytechnic Institute (http://www.me.wpi .edu/Tryggvason).
Format—plain text. After the header, each entry is a set of 6 float values, 3 for
position, 3 for displacement. There is roughly a 20:1:1 ratio between the 3
displacements.
Code—file can be rendered with FlowSlicer, a Processing program, and
FlowView, a Java program (also need Voxel.java) included in Appendix C and on
the book’s web site.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Sets

4. city temp.xls. A two-dimensional, nonuniform, geo-spatial, scalar


data set containing the average January temperature for 56 U.S.
cities.
Source—Peixoto, J.L. (1990), “A Property of Well-Formulated
Polynomial
Regression Models.” American Statistician, 44, 26–30. Also found
in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets,
London: Chapman & Hall, 208–210. Downloaded from
http://lib.stat.cmu.edu.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Sets

4. city temp.xls. A two-dimensional, nonuniform, geo-spatial, scalar


data set containing the average January temperature for 56 U.S.
cities.
Source—Peixoto, J.L. (1990), “A Property of Well-Formulated
Polynomial
Regression Models.” American Statistician, 44, 26–30. Also found
in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets,
London: Chapman & Hall, 208–210. Downloaded from
http://lib.stat.cmu.edu.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Human Perception and Information Processing

Perception deals with the


human senses that generate
signals from the environment
through sight, hearing, touch,
smell, and taste.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Human Perception and Information Processing

a Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception
•Perception (from the Latin perceptio, percipio) is the organization, identification and
interpretation of sensory information in order to represent and understand the environment.
•Perception is the process by which people Select, Organize, Interpret and Respond to
Information from the world around them.
•For example, vision involves light striking the retinas of the eyes, smell is mediated by odor
molecules and hearing involves pressure waves.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Role of Perception

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Electromagnetic Spectrum

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Electromagnetic Spectrum

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Visible Spectrum

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Visible Spectrum

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Anatomy of Visual System

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Lens System and Muscles

•The six muscles are generally considered as


motion controllers, providing the ability to
look at objects in the scene.

•The muscles tend to maintain the eye-level


with the horizon when the head is not
perfectly vertical.

•The optical system of the eye is similar in


characteristic to a double lens camera
system.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Lens System and Muscles

•The first component is the cornea, the exterior cover of the front of
the eye.

•Acting as a protective mechanism against physical damage to the


internal structure, it also serves as one lens focusing the light from
the surrounding scene onto the main lens

• From the cornea, light passes through the pupil, a circular hole in
the iris, similar in function to an aperture stop on a photographic
camera.
12/10/2024
Department of Computer Science & Engineering DATA
VISUALIZATION
Lens System and Muscles

•The iris is a colored annulus containing radial muscles for


changing the size of the pupil opening.

•The pupil determines how much light will enter the rest of the
internal chamber of the eye

•The third major component is the lens, whose crystalline


structure is similar to onion skin
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Retina

•The retina of the human eye contains the photoreceptors


responsible for the visual perception of our external world.

•It consists of two types of photosensitive cells: rods and cones.

•Rods are primarily responsible for intensity perception, and


cones for color perception.

•Rods are typically ten times more sensitive to light than cones.

•There is a small region at the center of the visual axis known as
the fovea
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Rods and Cones

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Rods

Rods are the most sensitive type of photoreceptor cells available in


the retina; consequently, they are associated with scotopic vision,
night vision, operating in clusters for increased sensitivity in very
low light conditions.

As these cells are thought to be achromatic, we tend to see objects


at night in shades of gray.
Rods do operate, within the visible spectrum between
approximately 400 and 700 nm.

It has been noted that during daylight levels of illumination, rods


become hyperpolarized, or completely saturated, and thus do not
contribute to vision
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Cones

•Cones provide photopic vision, i.e., are responsible for day vision. Also, they
perform with a high degree of acuity, since they generally operate individually.

•There are three types of cones in the human eye: S (short), M (medium), and L
(long) wavelengths .

•These three types have been associated with color combinations using R (red), G
(green), and B (blue).

•The long wavelength cones exhibit a spectrum peak at 560 nm, the medium
wavelength cones peak at 530 nm, and the short wavelength cones peak at around 420
nm. Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Perceptual Processing
•Perception can be intrinsic and uncontrolled
(preattentive) or controlled (attentive).

•Automatic or preattentive perception is fast and is


performed in parallel, B often within 250 ms.

•Some effects pop out and are the result of


preconscious visual processes.

• Attentive processes (or perception) transform


these early vision effects into structured objects.

•Attentive perception is slower and uses short-term


memory.

•It is selective and often represents aggregates of


what is in the scene.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Preattentive Processing

• Low-level attributes are rapidly perceived and then converted to


higher-level structured ones for performing various tasks, such as
finding a door in an emergency.

•First focus will be on low-level attributes, then turn to higher-level


ones, and finally put it all together with memory models.

How do we make things pop-out?

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Processing

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Processing

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Processing

Preattentive Processing is a term that refers to the body’s


processing of sensory information (ambient temperature, light
levels, etc.) that occurs before the conscious mind starts to pay
attention to any specific objects in its vicinity.

Eg: When a person walks out of their home, the first thing that
is noticed is the temperature and whether it is day or night, then
the mind starts to process the events that are occurring in the
area.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Preattentive Processing

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Features

To find the highest and lowest


values, it requires a full table scan
of the rows and columns.

Tabular Data with a Color


for Negative Values

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Features

Tabular Data with Sales


and Profitability by
Color Gradients

Visual Data with Sales by Bar


Length, Profitability by Color

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Preattentive Processing Theories

Texton Theory
Statistical analysis of texture patterns

01

Guided Search
Feature Integration Theory Theory
Different parts of the brain automatically
gather information about
02 04 Visual Search -
Hypothesized that an
basic features (colors, shape, movement) activation map based on
that are found in the visual field.
both bottom-up and top-
down information

Similarity Theory 03
Investigates on conjunction searches

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Feature Integration Theory
• Anne Treisman was one of the original researchers to document the area of preattentive
processing. She provided important insight into this phenomena by studying two
important problems.

• First, she tried to determine which visual properties are detected preattentively. She
called these properties "preattentive features”.

• Second, she formulated a hypothesis about how the human visual system performs
preattentive processing.

• Treisman ran experiments using target and boundary detection to classify


preattentive features.

• For target detection, subjects had to determine whether a target element was present
or absent in a field of background distractor elements.

• Boundary detection involved placing a group of target elements with a unique visual
feature within a set of distractors to see if the boundary could be preattentively
detected.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Feature Integration Theory

(a) (b)

An example of a boundary detection from Treisman's experiments: (a) a boundary defined


by a unique feature hue (red circles and red squares on the top, blue circles and blue
squares on the bottom) is preattentively classified as horizontal; (b) a boundary defined by
a conjunction of features (red circles and blue squares on the left, blue circles and red
squares on the right) cannot be preattentively classified as vertical

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Feature Integration Model

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Feature Integration Model
Treisman and other researchers measured for preattentive task performance in
two different ways: by response time, and by accuracy.

In the response time model viewers are asked to complete the task (e.g., target
detection) as quickly as possible while still maintaining a high level of accuracy.

The number of distractors in a scene is repeatedly increased. If task completion


time is relatively constant and below some chosen threshold, independent of
the number of distractors, the task is said to be preattentive.

If viewers can complete the task accurately, regardless of the number of


distractors, the feature used to define the target is assumed to be preattentive.

A common exposure duration threshold is 200 to 250 msec, since this allows
subjects only "one look" at the scene.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Feature Integration Model

Treisman proposed a model low-level human vision made up of a set of feature


maps and a master map of locations.

Each feature map registers activity in response to a specific visual feature.

Treisman suggested a manageable number of feature maps, including one for


each of the opponent colour primaries green, red, yellow, and blue, as well as
separate maps for orientation, shape, texture, and other preattentive features.

Treisman has expanded her strict dichotomy of features being detected either in
parallel or in serial

Treisman has also extended feature integration to explain certain cases where
conjunction search is preattentive. In particular, conjunction search tasks
involving motion, depth, colour, and orientation

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Texton Theory
• Julész's initial investigations focused on statistical analysis of texture
patterns

• Determines whether variations in a particular order statistic were seen (or not
seen) by the low-level visual system.

• Examples of variations in order statistics include contrast (a variation in a


texture's first-order statistic), orientation and regularity (a variation of the
second-order statistic), and curvature (a variation of the third-order statistic).

• Early visual system detects a group of features called textons.

• Textons can be classified into three general categories:


 Elongated blobs (e.g., line segments, rectangles, ellipses) with specific
properties such as hue, orientation, and width.
 Terminators (ends of line segments).
 Crossings of line segments.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Texton Theory
• Julész's initial investigations focused on statistical analysis of texture
patterns

• Determines whether variations in a particular order statistic were seen (or not
seen) by the low-level visual system.

• Examples of variations in order statistics include contrast (a variation in a


texture's first-order statistic), orientation and regularity (a variation of the
second-order statistic), and curvature (a variation of the third-order statistic).

• Early visual system detects a group of features called textons.

• Textons can be classified into three general categories:


 Elongated blobs (e.g., line segments, rectangles, ellipses) with specific
properties such as hue, orientation, and width.
 Terminators (ends of line segments).
 Crossings of line segments.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Texton Theory

(a) (b)

(c)
An example of textons: (a,b) two textons (A and B) that appear different in
isolation, but have the same size, number of terminators, and join points; (c) a
target group of B-textons is difficult to detect in a background of A-textons when
a random rotation is applied

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Similarity Theory

(a) (b)

produce an average search time produce an average search time


increase of 4.5 msec per additional increase of 54.5 msec per additional
distractor distractor

Example of N-N similarity affecting search efficiency for a target shaped like the letter L:
(a) high N-N (nontarget-nontarget) similarity allows easy detection of target L; (b)
low N-N similarity increases the difficulty of detecting the target L

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Similarity Theory

Search time is based on two criteria: T-N similarity and N-N similarity.
T-N similarity is the amount of similarity between the targets and
nontargets. N-N similarity is the amount of similarity within the
nontargets themselves.

These two factors affect search time as follows:

• as T-N similarity increases, search efficiency decreases and search


time increases,
• as N-N similarity decreases, search efficiency decreases and
search time increases, and
• T-N similarity and N-N similarity are related; decreasing N-N
similarity has little effect if T-N similarity is low; increasing T-N
similarity has little effect if N-N similarity is high.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Similarity Theory

Duncan and Humphreys proposed a three-step theory of visual


selection.
• The visual field is segmented into structural units.
• Access to visual short-term memory is limited.
• Units are grouped in a hierarchy.
Given these three steps, T-N and N-N similarity affects search efficiency.

Increased T-N similarity means more structural units match the template, so
competition for visual short-term memory access increases.

Decreased N-N similarity means, it cannot efficiently reject large numbers of


strongly grouped structural units, so resource allocation time and search time
increases.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Guide SearchTheory

• Hypothesized that an activation map based on both bottom-up and


top-down information is constructed during visual search.
• Attention is drawn to peaks in the activation map that represent areas
in the image with the largest combination of bottom-up and top-down
influence.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Postattentive Vision

• Wolfe argues that if multiple objects are recognized


simultaneously in the low-level visual system, it would involve
a search for links between the objects and their representation in
long-term memory (LTM).
• Wolfe designed targets with two critical properties
The targets were formed from a conjunction of features
(i.e., they could not be detected preattentively).

The targets were arbitrary combinations of colours and


shapes (i.e., they were not objects that could be
semantically recognized and remembered on the basis of
familiarity).
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Postattentive Vision

(a)

(b)

s Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Postattentive Vision-
• Text on a blank Search
screen was shown to Types
identify
the target. This was followed by a display
Traditional search containing 4, 5, 6, 7, or 8 potential target
objects in a 3-by-3 array

• The display to be searched was shown to the


user for a specific duration (up to 300
Postattentive search milliseconds). Text identifying the target was
then inserted into the scene

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
search
Search Types

ed in a
manne
r
• identic
Viewer
sal are to
are
repeate
asked
dto DATA
search,
search
Vision-

Department of Computer Science & Engineering


abut
the
with
group
same
of five
display

VISUALIZATION
sfive
letters
contai
350
times
Postattentive

ning
times
for five
letters
for
differea
rather
target
nt
than
letter.
targets.
combi
Repeated search
memory
Repeated search

nations
of
versus search

colour
s and
with letters
shapes.
Repeated

12/10/2024
search
Perception in Visualization

(a) a visualization of intelligent agents (b) a visualization of a CT scan (c)


a painterly visualization of weather conditions

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Color
bility
xi
Fle
Color is a common feature used in many visualization designs. Examples of
simple color scales include the rainbow spectrum, red-blue or red-green ramps,
and the grey-red saturation scale
Distinguishable

Perceptual
Balance

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Color

(a) (b)
(a) a nonphotorealistic visualization using simulated brush strokes to display
the underlying data; (b) a traditional visualization of the same data using
triangular glyphs

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Texture
• Texture is often viewed as a single visual feature. It can be
decomposed into a collection of fundamental perceptual
dimensions.

• Researchers in computer vision have used properties like regularity,


directionality, contrast, size, and coarseness to perform automatic
texture segmentation and classification.

• These texture features were derived both from statistical analysis,


and through experimental study.

• Texture pattern changes its visual appearance based on data in the


underlying dataset.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Texture

(a) (b) (c)

Examples of a natural brick texture applied to an underlying 3D object,


oriented to follow different properties of the surface at a per-pixel level: (a)
orientation follows a default "up" direction; (b) orientation follows the first
principle direction; (c) orientation follows the second principle direction

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Motion
• Motion is a third visual feature that is known to be perceptually salient.

• The use of motion is common in certain areas of visualization, for


example, the animation of particles, dye, or glyphs to represent the
direction and magnitude of a vector field (e.g., fluid flow visualization).

• Motion transients are also used to highlight changes in a dataset across a


user-selected data axis (e.g., over time for a temporal dataset, or along
the scanning axis for a set of CT or MRI slices).

• As with color and texture, our interest is in identifying the perceptual


dimensions of motion and applying them in an effective manner.

• Three motion properties have been studied extensively by researchers in


psychophysics: flicker, direction of motion, and velocity of motion.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Motion

https://vimeo.com/221683963

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Nonphotorealism
Nonphotorealistic renderings are often considered more effective, more
appropriate, or even more expressive than an equivalent photograph
Examples of
nonphotorealistic
enhancements for volume
illustration: (a) original
greyscale image of an
abdominal CT scan; (b) the
same image with tone
enhancement applied; (c)
with volumetric boundary
sketching; (d) original colour
image of the same abdominal
CT scan; (e) with halos and
boundary and silhouette
enhancement; (f) with tone
shading and boundary and
silhouette enhancement
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Memory Issues

Sensory Short-term Long-term


memory memory memory
• Sensory memory is • Short-term memory •Long-term memory is
high-capacity analyzes information complex and theoretically
information storage. from both sensory and limitless, much like a data
long-term storage. warehouse.
• It is effectively • This storage is multicoded,
• It has limited
preattentive eye information capacity. redundantly stored, and
filters. Large • It occurs at a high level organized in a complex
quantities of of processing, but the network structure.
information are time span is limited •Information retrieval is a
processed very fast typically to less than 30 key problem and access is
(less than 200 msec). seconds. unreliable and slow.

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Metrics
• When designing a visualization, it is important to factor in human limitations to
avoid generating images with ambiguous, misleading, or difficult-to-interpret
information.
• How many distinct line lengths and orientations can humans accurately perceive?
• How many different sound pitches or volumes can we distinguish without error?
• What is our “channel capacity” when dealing with color, taste, smell, or any other of
our senses?
• How are humans capable of recognizing hundreds of faces and thousands of spoken
words?
• What graphical entities can be accurately measured by humans?
• How many distinct entities can be used in a visualization without confusion?
• With what level of accuracy do we perceive various primitives?
• How do we combine primitives to recognize complex phenomena?
• How should color be usedDepartment
to present information?
of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Visualization Foundation

Semiology of Graphical Symbols

Symbols and Visualizations

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Features of Graphics

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Features of Graphics

Rules of a graphic.

1. The aim of a graphic is to discover groups or orders in x, and groups or


orders in y, that are formed on z-values;

2. (x, y, z)-construction enables in all cases the discovery of these groups;

3. Within the (x, y, z)-construction, permutations and classifications solve the


problem of the upper level of information;

4. Every graphic with more than three factors that differs from the (x, y, z)-
construction destroys the unity of the graphic and the upper level of
information; and

5. Pictures must be read and understood by the human.


Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Major Tasks in Data Preprocessing

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Cleaning

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Data Cleaning
Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g.,
instrument faulty, human or computer error, transmission error
Incomplete: lacking attribute values, lacking certain attributes of interest,
or containing only aggregate data
e.g., Occupation=“ ” (missing data)
Noisy: containing noise, errors, or outliers
e.g., Salary=“−10” (an error)
Inconsistent: containing discrepancies in codes or names, e.g.,
Age=“42”, Birthday=“03/07/2010”
Was rating “1, 2, 3”, now rating “A, B, C”
discrepancy between duplicate records
Intentional (e.g., disguised missing data)
Jan. 1 as everyone’s birthday?

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Incomplete (Missing) Data
Data is not always available
 E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
Missing data may be due to
 Equipment malfunction
 Inconsistent with other recorded data and thus deleted
 Data not entered due to misunderstanding
 Certain data may not be considered important at the time of
entry
 Not register history or changes of the data
Missing data may need to be inferred

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Handling Missing Data
Ignore the tuple: usually done when class label is missing (when
doing classification)—not effective when the % of missing
values per attribute varies considerably
Fill in the missing value manually: tedious + infeasible?
Fill in it automatically with
 A global constant : e.g., “unknown”, a new class?!
 The attribute mean
 The attribute mean for all samples belonging to the same
class: smarter
 the most probable value: inference-based such as Bayesian
formula or decision tree
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Handling Missing Data
Binning
First sort data and partition into (equal-frequency) bins
Then one can smooth by bin means, smooth by bin median,
Smooth by bin boundaries, etc.

Regression
Smooth by fitting the data into regression functions

Clustering
Detect and remove outliers

Combined computer and human inspection


Detect suspicious values and check by human (e.g., deal with
possible outliers)
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Binning

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
Noisy Data
Noise: random error or variance in a measured variable

Incorrect attribute values may be due to


 Faulty data collection instruments
 Data entry problems
 Data transmission problems
 Technology limitation
 Inconsistency in naming convention

Other data problems which require data cleaning


 Duplicate records
 Incomplete data
 Inconsistent data
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Data Cleaning as a Process
Data discrepancy detection
 Use metadata (e.g., domain, range, dependency, distribution)
 Check field overloading
 Check uniqueness rule, consecutive rule and null rule
 Use commercial tools

 Data scrubbing: use simple domain knowledge (e.g., postal code, spell-
check) to detect errors and make corrections
 Data auditing: by analyzing data to discover rules and relationship to
detect violators (e.g., correlation and clustering to find outliers)

Data migration and integration


 Data migration tools: allow transformations to be specified
 ETL (Extraction/Transformation/Loading) tools: allow users to specify
transformations through a graphical user interface

Integration of the two processes


 Iterative and interactive (e.g., Potter’s Wheels)
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Binning

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34

Department of Computer Science & Engineering DATA


12/10/2024 VISUALIZATION
References

1. https://www3.cs.stonybrook.edu/~qin/courses/geometry/fundamental-techniques
-graphics-visualization.pdf
2. https://web.fe.up.pt/~tavares/downloads/publications/artigos/IJI_Manuscript_D
A_JT.pdf
3. Understanding the Perception and Its Role in Management of Organization
4. https://www.csc2.ncsu.edu/faculty/healey/PP/
5. https://help.tableau.com/current/pro/desktop/en-us/multiple_measures.htm#Dual

You might also like