0% found this document useful (0 votes)
16 views

Lecture 1 1

This document introduces a course on data visualization. It provides an overview of the topics that will be covered, including defining visualization, its purposes, types of visualization techniques, and case studies exploring specific examples. Key concepts that will be discussed are open exploration, confirmation, and communication of data through visualization. Visualization is presented as a process of transforming abstract data into interactive graphical representations to aid in tasks like exploration, confirmation, and presentation.

Uploaded by

Fizza Mubeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture 1 1

This document introduces a course on data visualization. It provides an overview of the topics that will be covered, including defining visualization, its purposes, types of visualization techniques, and case studies exploring specific examples. Key concepts that will be discussed are open exploration, confirmation, and communication of data through visualization. Visualization is presented as a process of transforming abstract data into interactive graphical representations to aid in tasks like exploration, confirmation, and presentation.

Uploaded by

Fizza Mubeen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

CSE 5544: Introduction

to Data Visualization
Raghu Machiraju
[email protected]

[xkcd]
Many Thanks to

– Alex Lex, University of Utah


– Torsten Moeller, University of Vienna
– Tamara Munzner, Univ of British Columbia
– Hanspeter Pfister, Harvard University
Me
Me
Me

Houle  et  al.,  Nature  Genetics  Review,  2010


vi·su·al·i·za·tion
1. Formation of mental
visual images
2. The act or process
of interpreting in
visual terms or of
putting into visible
form

The American Heritage


Dictionary
Logographic
Systems
Visualization Definition
Visualization is the process that transforms
(abstract) data into
interactive graphical representations for the purpose of
exploration, confirmation, or presentation.
Diffusion
Tensor Imaging
Visualization Definition
Computer-based visualization systems provide visual
representations of datasets designed to help people carry out
tasks more effectively.
Purpose of Visualization
• Ingest Data
• Transform, derive & represent data
• Ask questions
• Once (communicate) or many times (explore)
• Iterate
Taxonomy
[Obama Administration]

Open Exploration
Confirmation Communication
Communicate

http://www.nytimes.com/interactive/2014/10/19/upshot/peyton-manning-breaks-touchdown-passing-record.html?_r=0 [New York Times]


Tell Stories [New York Times]
Confirm

http://www.boredpanda.com/find-the-panda-illustrated-puzzles-openlist/
Once is not enough !
Well formed questions:
How did the unemployment and labor force develop over the last so
many years?

When questions are not well defined:


Exploration
Which combination of genes causes cancer?
Which drug can help patient X?

[New York Times]


Exploration thru iteration

Finding Cancer Subtypes


iGPSE
Explore What ?

www.google.com ☺
Why not ?
Plain Statistics?
I II III IV
x y x y x y x y
10 8.0 10 9.1 10 7.4 8 6.5
8 6.9 8 8.1 8 6.7 8 5.7
13 7.5 13 8.7 13 12. 8 7.7
9 8.8 9 8.7 9 7.1 8 8.8
11 8.3 11 9.2 11 7.8 8 8.4
14 9.9 14 8.1 14 8.8 8 7.0
6 7.2 6 6.1 6 6.0 8 5.2
4 4.2 4 3.1 4 5.3 19 12.
12 10. 12 9.1 12 8.1 8 5.5
7 4.8 7 7.2 7 6.4 8 7.9
5 5.6 5 4.7
Mean x: 9 y: 7.50 5 5.7 8 6.8
Variance x: 11 y: 4.122
Correlation x – y: 0.816
Linear regression: y = 3.00 + 0.500x
Anscombe’s Quartet

Mean x: 9 y: 7.50
Variance x: 11 y: 4.122
Correlation x – y: 0.816
Linear regression: y = 3.00 + 0.500x
Why not ?
But …
And unknown parameteric space …
K in K-Means …

https://datasciencelab.files.wordpress.com/2013/12/p_n2000_k15_.gif
Not for the faint-hearted
Missing in action
• Domain knowledge
• Models of interaction
• Hypotheses or questions
• Optimal parameteric values
Visualization

http://undsci.berkeley.edu/images/us101/butterfly_example.gif
Why Pictures ?

http://www.boredpanda.com/find-the-panda-illustrated-puzzles-openlist/
Why Pictures ?
Pixels are richer; provide more
information with less clutter and in less
space.
Pixels provide the gestalt effect: they give
an overview; make structure more visible.
Pixels are more accessible, easier to
understand, faster to grasp, more
comprehensible, more memorable, more
fun, and less formal.
list adapted from: [Stasko et al. 1998]
New Yorker, postet by Alberto Cairo
Something real cool
Consider

London Subway Map, 1927


Take Liberty

Harry Beck, 1933


The Ability Space
Limits of Cognition

Daniel J. Simons and Daniel T. Levin, Failure to detect


changes to people during a real world interaction, 1998
Why Use Computers?
Scale
Drawing by hand infeasible
How to draw an MRI scan?

[Bruckner 2007]
Why Use Computers?
Scale
Interaction allows to “drill
down” into data
Integration with algorithms

[Sunburst by John Stasko, Implementation in Caleydo by Christian Partl]


Why Use Computers?
Efficiency
Re-use charts / methods for
different datasets
Quality
Precise data driven rendering
Storytelling
Use time
Good Data Visualization
… makes data accessible
… combines strengths of
humans and computers
… enables insight
… communicates
Visualization is like Design
https://vimeo.com/90355541
http://csis.pace.edu/ctappert/dps/d891b-14/Agile4.pdf

https://en.wikipedia.org/wiki/Design_thinking
http://www.designkit.org/
When not to visualize?
When to automate?
Well defined question on well-defined dataset
Which gene is most frequently mutated in this set of patients?
What is the current unemployment rate?

Decisions needed in minimal time


High frequency stock market trading: which stock to buy/sell?
Manufacturing: is bottle broken?
Case Study
Find Patterns

John Snow, 1854


E. Tufte,Visual Explanations, 1997
Process Design
Mapping
Discovery
Case Study
The Tube

London Subway Map, 1927


Clean it up

Harry Beck, 1933


The Design

• What are the questions that the reader will want to have answered?
• What tasks will the visualization help with?
• What are the benefits of the angular design vs the one true to geography?
New York Times, 2010
A Map

T. Fradet
Map +
Tube
Other
Arrangments

Jerome Cukier, D3
Writeup About the Map
Case Study
Developing Mouse Brain
Allen Mouse Brain Atlas
High-density “whole brain” microarrays (~1k regions, 64k probes)

Hawrylycz et al., 2012 Nature


Data - Gene Expression-Structure Matrix
2692 2 St
3 ag
Structures 5 es


8

2105 Genes
The Heirarchical Orientation Tree

es
g
g Sta
si n
re a
Inc
Temporal Clustering
Bi-clustered Gene Expression Flow Matrix
(BGEFM)

~10% ~19%

~9% ~14%

~6% ~11%

~31%
Case Study
A Solution –
ScatterPlot Matrix
Look Closely
Make Glyphs
Glyph SPLOM
Explore complex
dependencies visually
Top left is 4 gene expressions in human brain.
Bottom right is synthetic data in different patterns
Cluster
To Summarize A SPLOM
Scagnostics

Characterize 2D
scatterplots
Scagnostics Clusters

mRNA

protein
Scagnostics Differentiate
How did we get here?
Record

Konya town map, Turkey, c. 6200 BC Anaximander of Miletus, c. 550 BC

Milestones Project
Record

William Curtis (1746-1799)

Leonardo Da Vinci, ca. 1500

Galileo Galilei, 1616


Donald Norman The History of Visual Communication
The Galileo Project, Rice University
Record

E. J. Muybridge, 1878
Analyze

Planetary Movement Diagram, c. 950

Halley’s Wind Map, 1686

First time to show changing values graphically


First weather map
Analyze - PlayFair

W. Playfair, 1786

wikipedia.org W. Playfair, 1801


Analyze - Playfair

bar graph: export & import of scotland in 1781

wikipedia.org
Analyze - Playfair

pie chart: proportions of the turkish empire in Africa, Europe and Asia

wikipedia.org W. Playfair, 1801


Epidemology - Snow

John Snow, 1854


E. Tufte,Visual Explanations, 1997
Flow Map - Minard

the army's location and direction, showing where units split off and rejoined
* the declining size of the army (note e.g. the crossing of the Berezina river on the retreat)
* the low temperatures during the retreat.
C.J. Minard, 1869 E. Tufte, Writings, Artworks, News
Interact - Sutherland

Ivan Sutherland, Sketchpad, 1963


Doug Engelbart, 1968
Breaking the mold - Rosling

Hans Rosling, TED 2006


15 Exabytes in Punch Cards:

Big Data
4.5 km over New England

2010: 1,200 exabytes, largely unstructured


Google stores ~10 exabytes (2013)
Hard disk industry ships ~8 exabytes/year
http://onesecond.designly.com/
“The ability to take data—to be able to
understand it, to process it, to extract value
from it, to visualize it, to communicate it—
that’s going to be a hugely important skill in the
next decades, … because now we really do
have essentially free and ubiquitous data.”
Hal Varian, Google’s Chief Economist
The McKinsey Quarterly, Jan 2009
Revisiting

https://en.wikipedia.org/wiki/John_Tukey
https://www.stat.berkeley.edu/~brill/Papers/life.pdf

http://statweb.stanford.edu/~donoho/Lectures/AMS2000/MathChallengeSlides2*2.pdf
Visualization
“Visualization is really about external cognition, that is,
how resources outside the mind can be used to boost
the cognitive capabilities of the mind.”

Stuart Card
IEEE Transactions on Visualization and Computer
Graphics 12(6), p. 1363-1372 (2006)
Graphics meets vis in bigdataland

https://en.wikipedia.org/wiki/Pat_Hanrahan
Structure & Goals
Course Goals
Evaluate and critique visualization designs
Implement interactive data visualizations
Apply fundamental principles & techniques
Design visual data analysis solutions
Develop a substantial visualization project
No Device Policy
No Computers, Tablets, Phones in lecture hall
except when used for exercises
Switch off, mute, flight mode
Why?
It’s better to take note by hand
Notifications are designed to grab your attention
Applies to Theory lectures, coding along in technical lectures encouraged
Course Components
Lecture
Reading
Discussion
Theory
Sections
D3 reading
Design Lecture Self-study
Design Studios Office hours

Design Skills Coding Skills


Required Books
Programming
Is this course for me ???
One parallel
Prerequisites
Some Programming experience
C, C++, Java, Python, etc.
Willingness to learn new software & tools
This can be time consuming
You will need to build skills by yourself!
Engineering vs Computer Science

You might also like