Data Visualization with R and ggplot2
Data Visualization with R and ggplot2
Karthik Ram
September 2, 2013
[Link]/karthikram/ggplot-lecture
[Link]
Install some packages (make sure you also have recent copies of
reshape2 and plyr)
Basics
ggplot - The main function where you specify the dataset and
variables to plot
geoms - geometric objects
geom point(), geom bar(), geom density(), geom line(),
geom area()
aes - aesthetics
shape, transparency (alpha), color, fill, linetype.
scales Define how your data will be plotted
continuous, discrete, log
head(iris)
4.5
4.0
[Link]
3.5
3.0
2.5
2.0
5 6 7 8
[Link]
4.5
4.0
[Link]
3.5
3.0
2.5
2.0
5 6 7 8
[Link]
4.5
4.0
[Link]
3.5 Species
setosa
versicolor
3.0
virginica
2.5
2.0
5 6 7 8
[Link]
4.5
4.0
[Link]
3.5 Species
setosa
versicolor
3.0
virginica
2.5
2.0
5 6 7 8
[Link]
10000
G
H
5000
I
J
0
1 2 3
carat
Box plots
library(MASS)
ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()
5000
4000
3000
bwt
2000
1000
1 2 3
factor(race)
Histograms
150
count
100
50
0 50 100 150
waiting
60
40
count
20
30 50 70 90
waiting
Line plots
0.5
Anomaly10y
0.0
0.5
Anomaly10y
0.0
Modify the previous plot and change it such that there are
three lines instead of one with a confidence band.
0.5
Anomaly10y
0.0
Bar plots
300
[Link]
200
100
750
variable
500 [Link]
value
[Link]
[Link]
[Link]
250
6
variable
[Link]
value
4 [Link]
[Link]
[Link]
75
cut
Fair
Good
count
50
Very Good
Premium
Ideal
25
0.5
Anomaly10y
sign
FALSE
TRUE
0.0
Density Plots
0.03
density
0.02
0.01
0.00
50 60 70 80 90
waiting
ggplot(faithful, aes(waiting)) +
geom_density(fill = "blue", alpha = 0.1)
0.03
density
0.02
0.01
0.00
50 60 70 80 90
waiting
0.03
density
0.02
0.01
50 60 70 80 90
waiting
aes(color = variable)
aes(color = "black")
# Or add it as a scale
scale_fill_manual(values = c("color1", "color2"))
library(RColorBrewer)
[Link]()
6
variable
[Link]
value
4 [Link]
[Link]
[Link]
4.5
4.0
setosa
3.5
3.0
2.5
2.0
[Link]
4.5 Species
versicolor
4.0 setosa
3.5
3.0
versicolor
2.5
2.0 virginica
4.5
4.0
virginica
3.5
3.0
2.5
2.0
5 6 7 8
Data Visualization with R & ggplot2 [Link] Karthik Ram
Refer to a color chart for beautful visualizations
[Link]
Faceting
4.5
4.0
setosa
3.5
3.0
2.5
2.0
[Link]
4.5 Species
versicolor
4.0 setosa
3.5
3.0
versicolor
2.5
2.0 virginica
4.5
4.0
virginica
3.5
3.0
2.5
2.0
5 6 7 8
[Link]
Data Visualization with R & ggplot2 Karthik Ram
and along rows
ggplot(iris, aes([Link], [Link], color = Species)) +
geom_point() +
facet_grid(. Species)
Species
3.5
setosa
versicolor
3.0
virginica
2.5
2.0
5 6 7 8 5 6 7 8 5 6 7 8
[Link]
Data Visualization with R & ggplot2 Karthik Ram
or just wrap your panels
ggplot(iris, aes([Link], [Link], color = Species)) +
geom_point() +
facet_wrap( Species)
Species
3.5
setosa
versicolor
3.0
virginica
2.5
2.0
5 6 7 8 5 6 7 8 5 6 7 8
[Link]
Data Visualization with R & ggplot2 Karthik Ram
Section 11
Adding smoothers
4.5
[Link]
4.0 Species
3.5
setosa
3.0
versicolor
2.5 virginica
2.0
5 6 7 8
[Link]
Species
3.5
setosa
versicolor
3.0
virginica
2.5
2.0
5 6 7 8 5 6 7 8 5 6 7 8
[Link]
Themes
+theme()
# see ?theme() for more options
4.0
3.5
[Link]
3.0
2.5
2.0
5 6 7 8 5 6 7 8 5 6 7 8
[Link]
Species
setosa
versicolor
virginica
[Link]("ggthemes")
library(ggthemes)
# Then add one of these themes to your plot
+theme_stata()
+theme_excel()
+theme_wsj()
+theme_solarized()
Then just call your function to generate a plot. Its a lot easier to
fix one function that do it over and over for many plots
Scales
scale_fill_discrete(); scale_colour_discrete()
scale_fill_hue(); scale_color_hue()
scale_fill_manual(); scale_color_manual()
scale_fill_brewer(); scale_color_brewer()
scale_linetype(); scale_shape_manual()
4 Kg
3 Kg
bwt
2 Kg
1 Kg
1 2 3
factor(race)
Data Visualization with R & ggplot2 Karthik Ram
Another continuous scale with custom labels
20
count
25
20
count
15
10
10
5
0
40 60 80 100
waiting
Specify a size
ggsave(file = "/path/to/figure/[Link]", width = 6,
height =4)