1152cs191 Data Visualization Unit i
1152cs191 Data Visualization Unit i
CO4
CO3
CO2
CO1
Engineering Knowledge
3
3
3
3
12/10/2024
Problem Analysis
2
2
Design / Development of solutions
2
2
Conduct investigations of complex
2
2
problems
2
Ethics
VISUALIZATION
Individual & Team Work
Communication
Mathematical Concepts
3
3
and Program Specific Outcomes
Software Development
2
Transferring Skills
Correlation of COs with Program Outcomes
Correlation of COs with Student Outcomes
ABET EAC
CO1 3 2 2
CO2 3 2 2
CO3 3 2 2
CO4 3 2 2
CO5 2 2 2 2
CO1 2 2 2
CO2 2 2 2
CO3 2 2
CO4 2 2
CO5 3 2 2 2
Learning Resources:
1. https://rstudio.github.io/r2d3/
2. https://www.analyticsvidhya.com/blog/2016/10/creating-interactive-
data-visualization-using-shiny-app-in-r-with-examples/
3. https://medium.com/opex-analytics/when-what-data-viz-part-ii-b4f76
f5d0a29
Visualization
Process
How to
visualize
What can be them?
Visualized The visualization
process (and the
Visualization Types/features result) may
Visible reality: person,
animal, building,
of visualization change the
Visualization mountain forms original form of
– Hidden reality: earth – 2D vs. 3D things (or create a
Process of core, blood vessel,
Related to vision – Static vs. new form) for
forming a visual universe
(seeing through – Invisible reality: motion better
image of things
eyes), one of the
wind, air, heat, electron, – Virtual vs. understanding
that can be sound, smell, magnetic
major materialized and
seen through eyes fields
human senses to – Abstract entity: data, – Realistic vs. communication.
(or imagined in
interact with the
information, idea, abstract
human mind). hierarchy, process,
world relationship
The human brain grasps visuals more Helps quickly identify any
Helps business stakeholders analyze
easily than table reports. Data errors in the data. If the data
reports regarding sales, marketing
visualizations allow decision makers to tends to suggest the wrong
strategies, and product interest. Based on
be notified quickly of new data insights actions, visualizations help
the analysis, they can focus on the areas
and take necessary actions for business identify erroneous data sooner
that require attention to increase profits,
growth. so that it can be removed
which in turn makes the business more
from analysis.
productive.
•Process of acquiring knowledge and understanding through thought, experience, and the
senses.
• What does the user see in a visualization?
• What information gets understood, missed, remembered?
• For how long can such information be remembered?
• Each of these questionsDepartment
requires us to look beyond perception and into cognition.
of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Pseudocode Conventions
data—The working data table. This data table is assumed to contain only
numeric values. In practice, dimensions of the original data table that contain
non-numeric values must be somehow converted to numeric values.
Circle(x, y, radius)—A function that fills a circle centered at the given (x, y)-
location, with the given radius, with the color of the graphics environment.
The plotting space for all visualizations is the unit square. In practice, this
function must map the unit square to a square in pixel coordinates.
Polygon(xs, ys)—A function that fills the polygon defined by the given arrays
of x- and y-coordinates with the color of the current color state.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Pseudocode Conventions
For geographic visualizations, the following functions are assumed to exist in
the environment:
The x- and y-axes represent data from dimension numbers xDim and
yDim, respectively.
The radius of the circles is derived from dimension number rDim, as well
as from the upper and lower bounds for the radius, rMin and rMax.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Scatterplot
Examples:
•Gender: Male, Female, Other.
•Hair Color: Brown, Black, Blonde, Red,
Other.
•Type of living accommodation: House,
Apartment, Trailer, Other.
•Genotype: Bb, bb, BB, bB.
•Religious preference: Buddhist, Mormon,
Muslim, Jewish, Christian, Other.
Examples:
•High school class ranking: 1st, 9th, 87th…
•Socioeconomic status: poor, middle class,
rich.
•The Likert Scale: strongly disagree, disagree,
neutral, agree, strongly agree.
•Level of Agreement: yes, maybe, no.
•Time of Day: dawn, morning, noon,
afternoon, evening, night.
•Political Orientation: left, center, right.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Interval
Interval: has values of equal intervals that mean something. For example,
a thermometer might have intervals of ten degrees.
Examples:
•Celsius Temperature.
•Fahrenheit Temperature.
•IQ (intelligence scale).
•SAT scores.
•Time on a clock with hands.
Examples:
Age.
•Weight.
•Height.
•Sales Figures.
•Ruler measurements.
•Income earned in a week.
•Years of education.
•Number of children.
Data sets have structure, both in terms of the means of representation (syntax), and
the types of interrelationships within a given record and between records (semantics).
•Vectors are found in typical data sets include position (2 or 3 spatial values), color
(a triplet of red, green, and blue components), and phone number. While each
component of a vector might be examined individually, it is most common to treat
the vector as a whole.
Tensor
•Scalars and vectors are simple variants on a more
general structure known as a tensor.
•A tensor is defined by its rank and by the
dimensionality of the space within which it is
defined.
• A scalar can be designated a tensor of rank zero.
• A vector can be designated a tensor of rank one.
• A rank-three tensor is represented with a cubic
second-order Cauchy
matrix, with components coming out of your stress tensor
computer
12/10/2024 screen.
Department of Computer Science & Engineering
VISUALIZATION
DATA
Geometry and Grids
Incorporating geometric structure in a data set
is to have explicit coordinates for each data
record.
In modeling of 3D objects, the geometry
constitutes the majority of the data, with
coordinates given for each vertex.
Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty,
human or computer error, transmission error
incomplete: lacking attribute values, lacking certain attributes of interest, or
containing only aggregate data
e.g., Occupation=“ ” (missing data)
noisy: containing noise, errors, or outliers
e.g., Salary=“−10” (an error)
inconsistent: containing discrepancies in codes or names, e.g.,
Age=“42”, Birthday=“03/07/2010”
Was rating “1, 2, 3”, now rating “A, B, C”
discrepancy between duplicate records
Intentional (e.g., disguised missing data) Jan. 1 as everyone’s birthday?
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Segmentation
The data can be separated into contiguous regions, where each region corresponds
to a particular classification of the data.
•The first component is the cornea, the exterior cover of the front of
the eye.
• From the cornea, light passes through the pupil, a circular hole in
the iris, similar in function to an aperture stop on a photographic
camera.
12/10/2024
Department of Computer Science & Engineering DATA
VISUALIZATION
Lens System and Muscles
•The pupil determines how much light will enter the rest of the
internal chamber of the eye
•Rods are typically ten times more sensitive to light than cones.
•
•There is a small region at the center of the visual axis known as
the fovea
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Rods and Cones
•Cones provide photopic vision, i.e., are responsible for day vision. Also, they
perform with a high degree of acuity, since they generally operate individually.
•There are three types of cones in the human eye: S (short), M (medium), and L
(long) wavelengths .
•These three types have been associated with color combinations using R (red), G
(green), and B (blue).
•The long wavelength cones exhibit a spectrum peak at 560 nm, the medium
wavelength cones peak at 530 nm, and the short wavelength cones peak at around 420
nm. Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Perceptual Processing
•Perception can be intrinsic and uncontrolled
(preattentive) or controlled (attentive).
Eg: When a person walks out of their home, the first thing that
is noticed is the temperature and whether it is day or night, then
the mind starts to process the events that are occurring in the
area.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Preattentive Processing
Texton Theory
Statistical analysis of texture patterns
01
Guided Search
Feature Integration Theory Theory
Different parts of the brain automatically
gather information about
02 04 Visual Search -
Hypothesized that an
basic features (colors, shape, movement) activation map based on
that are found in the visual field.
both bottom-up and top-
down information
Similarity Theory 03
Investigates on conjunction searches
• First, she tried to determine which visual properties are detected preattentively. She
called these properties "preattentive features”.
• Second, she formulated a hypothesis about how the human visual system performs
preattentive processing.
• For target detection, subjects had to determine whether a target element was present
or absent in a field of background distractor elements.
• Boundary detection involved placing a group of target elements with a unique visual
feature within a set of distractors to see if the boundary could be preattentively
detected.
Department of Computer Science & Engineering DATA
12/10/2024 VISUALIZATION
Feature Integration Theory
(a) (b)
In the response time model viewers are asked to complete the task (e.g., target
detection) as quickly as possible while still maintaining a high level of accuracy.
A common exposure duration threshold is 200 to 250 msec, since this allows
subjects only "one look" at the scene.
Treisman has expanded her strict dichotomy of features being detected either in
parallel or in serial
Treisman has also extended feature integration to explain certain cases where
conjunction search is preattentive. In particular, conjunction search tasks
involving motion, depth, colour, and orientation
• Determines whether variations in a particular order statistic were seen (or not
seen) by the low-level visual system.
• Determines whether variations in a particular order statistic were seen (or not
seen) by the low-level visual system.
(a) (b)
(c)
An example of textons: (a,b) two textons (A and B) that appear different in
isolation, but have the same size, number of terminators, and join points; (c) a
target group of B-textons is difficult to detect in a background of A-textons when
a random rotation is applied
(a) (b)
Example of N-N similarity affecting search efficiency for a target shaped like the letter L:
(a) high N-N (nontarget-nontarget) similarity allows easy detection of target L; (b)
low N-N similarity increases the difficulty of detecting the target L
Search time is based on two criteria: T-N similarity and N-N similarity.
T-N similarity is the amount of similarity between the targets and
nontargets. N-N similarity is the amount of similarity within the
nontargets themselves.
Increased T-N similarity means more structural units match the template, so
competition for visual short-term memory access increases.
(a)
(b)
ed in a
manne
r
• identic
Viewer
sal are to
are
repeate
asked
dto DATA
search,
search
Vision-
VISUALIZATION
sfive
letters
contai
350
times
Postattentive
ning
times
for five
letters
for
differea
rather
target
nt
than
letter.
targets.
combi
Repeated search
memory
Repeated search
nations
of
versus search
colour
s and
with letters
shapes.
Repeated
12/10/2024
search
Perception in Visualization
Perceptual
Balance
(a) (b)
(a) a nonphotorealistic visualization using simulated brush strokes to display
the underlying data; (b) a traditional visualization of the same data using
triangular glyphs
https://vimeo.com/221683963
Rules of a graphic.
4. Every graphic with more than three factors that differs from the (x, y, z)-
construction destroys the unity of the graphic and the upper level of
information; and
Regression
Smooth by fitting the data into regression functions
Clustering
Detect and remove outliers
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Data scrubbing: use simple domain knowledge (e.g., postal code, spell-
check) to detect errors and make corrections
Data auditing: by analyzing data to discover rules and relationship to
detect violators (e.g., correlation and clustering to find outliers)
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
1. https://www3.cs.stonybrook.edu/~qin/courses/geometry/fundamental-techniques
-graphics-visualization.pdf
2. https://web.fe.up.pt/~tavares/downloads/publications/artigos/IJI_Manuscript_D
A_JT.pdf
3. Understanding the Perception and Its Role in Management of Organization
4. https://www.csc2.ncsu.edu/faculty/healey/PP/
5. https://help.tableau.com/current/pro/desktop/en-us/multiple_measures.htm#Dual