Lecture 1 1
Lecture 1 1
to Data Visualization
Raghu Machiraju
[email protected]
[xkcd]
Many Thanks to
Open Exploration
Confirmation Communication
Communicate
http://www.boredpanda.com/find-the-panda-illustrated-puzzles-openlist/
Once is not enough !
Well formed questions:
How did the unemployment and labor force develop over the last so
many years?
www.google.com ☺
Why not ?
Plain Statistics?
I II III IV
x y x y x y x y
10 8.0 10 9.1 10 7.4 8 6.5
8 6.9 8 8.1 8 6.7 8 5.7
13 7.5 13 8.7 13 12. 8 7.7
9 8.8 9 8.7 9 7.1 8 8.8
11 8.3 11 9.2 11 7.8 8 8.4
14 9.9 14 8.1 14 8.8 8 7.0
6 7.2 6 6.1 6 6.0 8 5.2
4 4.2 4 3.1 4 5.3 19 12.
12 10. 12 9.1 12 8.1 8 5.5
7 4.8 7 7.2 7 6.4 8 7.9
5 5.6 5 4.7
Mean x: 9 y: 7.50 5 5.7 8 6.8
Variance x: 11 y: 4.122
Correlation x – y: 0.816
Linear regression: y = 3.00 + 0.500x
Anscombe’s Quartet
Mean x: 9 y: 7.50
Variance x: 11 y: 4.122
Correlation x – y: 0.816
Linear regression: y = 3.00 + 0.500x
Why not ?
But …
And unknown parameteric space …
K in K-Means …
https://datasciencelab.files.wordpress.com/2013/12/p_n2000_k15_.gif
Not for the faint-hearted
Missing in action
• Domain knowledge
• Models of interaction
• Hypotheses or questions
• Optimal parameteric values
Visualization
http://undsci.berkeley.edu/images/us101/butterfly_example.gif
Why Pictures ?
http://www.boredpanda.com/find-the-panda-illustrated-puzzles-openlist/
Why Pictures ?
Pixels are richer; provide more
information with less clutter and in less
space.
Pixels provide the gestalt effect: they give
an overview; make structure more visible.
Pixels are more accessible, easier to
understand, faster to grasp, more
comprehensible, more memorable, more
fun, and less formal.
list adapted from: [Stasko et al. 1998]
New Yorker, postet by Alberto Cairo
Something real cool
Consider
[Bruckner 2007]
Why Use Computers?
Scale
Interaction allows to “drill
down” into data
Integration with algorithms
https://en.wikipedia.org/wiki/Design_thinking
http://www.designkit.org/
When not to visualize?
When to automate?
Well defined question on well-defined dataset
Which gene is most frequently mutated in this set of patients?
What is the current unemployment rate?
• What are the questions that the reader will want to have answered?
• What tasks will the visualization help with?
• What are the benefits of the angular design vs the one true to geography?
New York Times, 2010
A Map
T. Fradet
Map +
Tube
Other
Arrangments
Jerome Cukier, D3
Writeup About the Map
Case Study
Developing Mouse Brain
Allen Mouse Brain Atlas
High-density “whole brain” microarrays (~1k regions, 64k probes)
2105 Genes
The Heirarchical Orientation Tree
es
g
g Sta
si n
re a
Inc
Temporal Clustering
Bi-clustered Gene Expression Flow Matrix
(BGEFM)
~10% ~19%
~9% ~14%
~6% ~11%
~31%
Case Study
A Solution –
ScatterPlot Matrix
Look Closely
Make Glyphs
Glyph SPLOM
Explore complex
dependencies visually
Top left is 4 gene expressions in human brain.
Bottom right is synthetic data in different patterns
Cluster
To Summarize A SPLOM
Scagnostics
Characterize 2D
scatterplots
Scagnostics Clusters
mRNA
protein
Scagnostics Differentiate
How did we get here?
Record
Milestones Project
Record
E. J. Muybridge, 1878
Analyze
W. Playfair, 1786
wikipedia.org
Analyze - Playfair
pie chart: proportions of the turkish empire in Africa, Europe and Asia
the army's location and direction, showing where units split off and rejoined
* the declining size of the army (note e.g. the crossing of the Berezina river on the retreat)
* the low temperatures during the retreat.
C.J. Minard, 1869 E. Tufte, Writings, Artworks, News
Interact - Sutherland
Big Data
4.5 km over New England
https://en.wikipedia.org/wiki/John_Tukey
https://www.stat.berkeley.edu/~brill/Papers/life.pdf
http://statweb.stanford.edu/~donoho/Lectures/AMS2000/MathChallengeSlides2*2.pdf
Visualization
“Visualization is really about external cognition, that is,
how resources outside the mind can be used to boost
the cognitive capabilities of the mind.”
Stuart Card
IEEE Transactions on Visualization and Computer
Graphics 12(6), p. 1363-1372 (2006)
Graphics meets vis in bigdataland
https://en.wikipedia.org/wiki/Pat_Hanrahan
Structure & Goals
Course Goals
Evaluate and critique visualization designs
Implement interactive data visualizations
Apply fundamental principles & techniques
Design visual data analysis solutions
Develop a substantial visualization project
No Device Policy
No Computers, Tablets, Phones in lecture hall
except when used for exercises
Switch off, mute, flight mode
Why?
It’s better to take note by hand
Notifications are designed to grab your attention
Applies to Theory lectures, coding along in technical lectures encouraged
Course Components
Lecture
Reading
Discussion
Theory
Sections
D3 reading
Design Lecture Self-study
Design Studios Office hours