0% found this document useful (0 votes)
20 views

Big Data - Lab 5

The document discusses various data visualization techniques in R including histograms, boxplots, and bar plots. It provides examples of how to create histograms with different numbers of bins and draws, boxplots to show median and outliers, and bar plots from both vector and matrix data. It also discusses adding labels, lines, and points to customized plots.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Big Data - Lab 5

The document discusses various data visualization techniques in R including histograms, boxplots, and bar plots. It provides examples of how to create histograms with different numbers of bins and draws, boxplots to show median and outliers, and bar plots from both vector and matrix data. It also discusses adding labels, lines, and points to customized plots.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Big Data

Fifth Section
Agenda
▪ Histogram
▪ Boxplot
▪ Bar plot
Histograms
▪ A diagram consisting of rectangles whose area is proportional to the
frequency of a variable
▪ The parameter breaks is key:
- Specifies the number of categories to plot
- Specifies the breakpoints for each category
▪ The xlab, ylab, xlim, ylim options work as expected
Histograms

hist(ChickWeight$weight, col =
"lightblue", xlab = "Weight",
main = "Weight Histogram")
Histograms With Breaks
hist(ChickWeight$weight,
col = "lightblue", xlab =
"Weight", main = "Weight
Histogram", breaks = seq
(0,400, by=10) )
Boxplot
▪Generated by the boxplot()
function
▪Draws plot summarizing
- Median
• Quartiles (Q1, Q3)*
• Outliers – by default, observations
more than 1.5 * (Q1 – Q3) distant
from nearest quartile
Boxplot
boxplot(ChickWeight, col = rainbow(4), ylab = "ChickWeight Boxplot")

Outliers
Boxplot for Weight
▪ rug() can add a tick for each observation to
the side of a boxplot() and other plots.

▪ The side parameter specifies where tick


marks are drawn.

> boxplot(ChickWeight$weight, col =


"Red", ylab = "ChickWeight Boxplot")

> rug(ChickWeight$weight,side=2)
Bar Plot | Vector
▪ Create barplots with the barplot(height) function, where height is a
vector or matrix.
▪ If height is a vector, the values determine the heights of the bars in the
plot.
▪ Need to be based on counts, frequencies.
▪ You can create bar plots that represent means, medians, standard
deviations, etc. Use the aggregate( ) function and pass the results to
the barplot( ) function.
▪ names.arg: a vector of names to be plotted below each bar.
Bar Plot | Practice 1
> counts <- table(ChickWeight$weight)
#Creates a table with two columns, weight value and count
> barplot (counts, main= “Weight Distribution”)
Bar Plot | Practice 1
Bar Plot | Practice 2
# I want to create a barplot of the average weight
grouped per day for the ChickWeight data set.
>AvgWeightPerDay <-
aggregate(ChickWeight[,1],list(ChickWeight$Time), mean)

>AvgWeightPerDay

>barplot(AvgWeightPerDay$x, names.arg=
AvgWeightPerDay$Group.1, main= "Average Weight Per Day")
Bar Plot | Practice 2
Average Weight Per Day
200
150
100
50
0

0 2 4 6 8 10 12 14 16 18 20 21
Bar Plot | Matrix
▪ If height is a matrix and the option beside=FALSE then each bar
of the plot corresponds to a column of height, with the values in
the column giving the heights of stacked “sub-bars”.
▪ If legend controls if you want to show the guiding table is the
table at the top right. T means true.
Bar Plot | Practice 3
>data(VADeaths)
>VADeaths
>barplot(VADeaths,col=rainbow(5),legend=T)
Bar Plot | Practice 3
>barplot(VADeaths,col=rainbow(5), legend=T,beside=T)
Customizing Plots
▪R provides a series of functions for adding text,
lines and points to a plot
▪We will illustrate some useful ones, but look at
demo(graphics) for more examples
▪Type <Return> or Press enter for more
Drawing on a plot
▪To add additional data use
- points(x,y)
- lines(x,y)
▪For freehand drawing use
- polygon()
- rect()
Text Drawing
▪Two commonly used functions:
- text() – writes inside the plot region, could be used to label
datapoints
- mtext() – writes on the margins
Plotting Two Data Series
> x <- seq(0,2*pi, by = 0.1)
> y <- sin(x)
> plot(x,y, col = "green", type = "l", lwd = 3)
> y1 <- cos(x)
> lines(x,y1, col = "red", lwd = 3)
> mtext("Sine and Cosine Plot", side = 3, line = 1)
Adding a Label & Rectangle
> rect(0,-1,2,0.5)
> text(1,0.6, "label here")
Plotting Functions
> f <- function(x) x * (x + 1) / 2
> x <- 1:20
> y <- f(x)
> plot(x, y)
> mtext("Plotting the expression", side = 3, line = 2.5)
> mtext(expression(y == sum(i,1,x)), side = 3, line = 0)
> mtext("The first variable", side = 1, line = 3)
> mtext("The second variable", side = 2, line = 3)
Symbolic Math Expressions
Multiple Plots on a Page
▪ R makes it easy to combine multiple plots into one overall graph,
using par( )
▪ Take 2 dimensional vector as an argument
- The first value specifies the number of rows
- The second specifies the number of columns
▪ The 2 options differ in the order individual plots are printed
▪ With the par( ) function, you can include the mfrow or mfcol
▪ option mfrow=c(nrows, ncols) to create a matrix of nrows x
ncols plots that are filled in by row.
▪ mfcol=c(nrows, ncols) fills in the matrix by columns.
Multiple Plots on a Page
>par(mfcol = c(3,1))

>hist(ChickWeight $weight*1000, breaks =


10, main = "Weight (in mg)", xlab = "Weight")

>hist(ChickWeight$weight, breaks = 10, main


= "Weight (in g)", xlab = "Weight")

>hist(ChickWeight$weight/1000,breaks = 10,
main = "Weight (in kg)", xlab = "Weight")
Thank You

You might also like