Big Data - Lab 5
Big Data - Lab 5
Fifth Section
Agenda
▪ Histogram
▪ Boxplot
▪ Bar plot
Histograms
▪ A diagram consisting of rectangles whose area is proportional to the
frequency of a variable
▪ The parameter breaks is key:
- Specifies the number of categories to plot
- Specifies the breakpoints for each category
▪ The xlab, ylab, xlim, ylim options work as expected
Histograms
hist(ChickWeight$weight, col =
"lightblue", xlab = "Weight",
main = "Weight Histogram")
Histograms With Breaks
hist(ChickWeight$weight,
col = "lightblue", xlab =
"Weight", main = "Weight
Histogram", breaks = seq
(0,400, by=10) )
Boxplot
▪Generated by the boxplot()
function
▪Draws plot summarizing
- Median
• Quartiles (Q1, Q3)*
• Outliers – by default, observations
more than 1.5 * (Q1 – Q3) distant
from nearest quartile
Boxplot
boxplot(ChickWeight, col = rainbow(4), ylab = "ChickWeight Boxplot")
Outliers
Boxplot for Weight
▪ rug() can add a tick for each observation to
the side of a boxplot() and other plots.
> rug(ChickWeight$weight,side=2)
Bar Plot | Vector
▪ Create barplots with the barplot(height) function, where height is a
vector or matrix.
▪ If height is a vector, the values determine the heights of the bars in the
plot.
▪ Need to be based on counts, frequencies.
▪ You can create bar plots that represent means, medians, standard
deviations, etc. Use the aggregate( ) function and pass the results to
the barplot( ) function.
▪ names.arg: a vector of names to be plotted below each bar.
Bar Plot | Practice 1
> counts <- table(ChickWeight$weight)
#Creates a table with two columns, weight value and count
> barplot (counts, main= “Weight Distribution”)
Bar Plot | Practice 1
Bar Plot | Practice 2
# I want to create a barplot of the average weight
grouped per day for the ChickWeight data set.
>AvgWeightPerDay <-
aggregate(ChickWeight[,1],list(ChickWeight$Time), mean)
>AvgWeightPerDay
>barplot(AvgWeightPerDay$x, names.arg=
AvgWeightPerDay$Group.1, main= "Average Weight Per Day")
Bar Plot | Practice 2
Average Weight Per Day
200
150
100
50
0
0 2 4 6 8 10 12 14 16 18 20 21
Bar Plot | Matrix
▪ If height is a matrix and the option beside=FALSE then each bar
of the plot corresponds to a column of height, with the values in
the column giving the heights of stacked “sub-bars”.
▪ If legend controls if you want to show the guiding table is the
table at the top right. T means true.
Bar Plot | Practice 3
>data(VADeaths)
>VADeaths
>barplot(VADeaths,col=rainbow(5),legend=T)
Bar Plot | Practice 3
>barplot(VADeaths,col=rainbow(5), legend=T,beside=T)
Customizing Plots
▪R provides a series of functions for adding text,
lines and points to a plot
▪We will illustrate some useful ones, but look at
demo(graphics) for more examples
▪Type <Return> or Press enter for more
Drawing on a plot
▪To add additional data use
- points(x,y)
- lines(x,y)
▪For freehand drawing use
- polygon()
- rect()
Text Drawing
▪Two commonly used functions:
- text() – writes inside the plot region, could be used to label
datapoints
- mtext() – writes on the margins
Plotting Two Data Series
> x <- seq(0,2*pi, by = 0.1)
> y <- sin(x)
> plot(x,y, col = "green", type = "l", lwd = 3)
> y1 <- cos(x)
> lines(x,y1, col = "red", lwd = 3)
> mtext("Sine and Cosine Plot", side = 3, line = 1)
Adding a Label & Rectangle
> rect(0,-1,2,0.5)
> text(1,0.6, "label here")
Plotting Functions
> f <- function(x) x * (x + 1) / 2
> x <- 1:20
> y <- f(x)
> plot(x, y)
> mtext("Plotting the expression", side = 3, line = 2.5)
> mtext(expression(y == sum(i,1,x)), side = 3, line = 0)
> mtext("The first variable", side = 1, line = 3)
> mtext("The second variable", side = 2, line = 3)
Symbolic Math Expressions
Multiple Plots on a Page
▪ R makes it easy to combine multiple plots into one overall graph,
using par( )
▪ Take 2 dimensional vector as an argument
- The first value specifies the number of rows
- The second specifies the number of columns
▪ The 2 options differ in the order individual plots are printed
▪ With the par( ) function, you can include the mfrow or mfcol
▪ option mfrow=c(nrows, ncols) to create a matrix of nrows x
ncols plots that are filled in by row.
▪ mfcol=c(nrows, ncols) fills in the matrix by columns.
Multiple Plots on a Page
>par(mfcol = c(3,1))
>hist(ChickWeight$weight/1000,breaks = 10,
main = "Weight (in kg)", xlab = "Weight")
Thank You