0% found this document useful (0 votes)
12 views

Week 11 Tasks and Solutions

The document provides instructions for a series of tasks to practice working with data frames and visualization in R. Task 1 involves loading and manipulating the iris dataset, including selecting columns, creating new columns, and filtering rows. Task 2 focuses on data visualization using ggplot2, including loading demographic data, exploring plot types and shapes, and creating dot plots to visualize the USArrests dataset based on assault, rape, and other columns. Task 3 calculates summary statistics like sums, means, and minimums from the USArrests data and displays the results in bar plots and other graphs.

Uploaded by

misxbeepics
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Week 11 Tasks and Solutions

The document provides instructions for a series of tasks to practice working with data frames and visualization in R. Task 1 involves loading and manipulating the iris dataset, including selecting columns, creating new columns, and filtering rows. Task 2 focuses on data visualization using ggplot2, including loading demographic data, exploring plot types and shapes, and creating dot plots to visualize the USArrests dataset based on assault, rape, and other columns. Task 3 calculates summary statistics like sums, means, and minimums from the USArrests data and displays the results in bar plots and other graphs.

Uploaded by

misxbeepics
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

R Practice Tasks

Task 1 Data Frame Practice

a) Save both the iris and demographic data to a folder in your M drive

b) Set your working directly to the same folder

c) Create a variable named “dataset” to read the iris dataset


dataset <- read.csv("iris.csv")

d) Display the first 15 rows of the dataset


head(dataset, n=15)

e) Using the factor command investigate how many different species of iris flowers
levels(factor(dataset$species))

f) Create a dataframe “newdataset” to show all data for columns petal length and sepal length
only
newdataset <- data.frame(dataset$sepal_length, dataset$petal_length)

g) Install and load the dplyr package and run the following statement: what does this show?
install.packages("dplyr")
library(dplyr)
select(dataset, sepal_width, petal_width)

h) Assign a variable name “widthonly” to the statement in question g, run the variable to
display the results
widthonly <- select(dataset, sepal_width, petal_width)

i) Create a new column in the iris dataset to show the area for the petal of each flower (petal
length * petal width) – ignore any error messages, now display the results of the dataset
showing the new column, it should look like this:

dataset$petal_area <- dataset$petal_width * dataset$petal_length

j) Show all petal areas which are greater than or equal to 8.64 – assign a variable and run it
petalfilter <- dataset$petal_area >= 8.64
dataset[petalfilter,]

k) Create a new variable to extend question j to display petal areas which are greater than or
equal to 8.64 and less than 12, run your new variable
petalfilter2 <- dataset$petal_area >= 8.64 & dataset$petal_area < 12
dataset[petalfilter2,]

Task 2 – Visualisation

a) install and load ggplot package


install.packages("ggplot")
library(ggplot)

b) create a variable “mydata” to read the demographic dataset and display all data
mydata <- read.csv("Demographic-Data.csv")
mydata

c) Investigate the demographic data set using functions such as head, str, summary
head(mydata)
str(mydata)
summary(mydata)

d) Run each of the statements on slide 16 and 17 and observe the results

e) Follow and carry out the instructions from slides 18 – 23 – explore the different shapes
available.

f) You are now going to work with the USArrests built in dataset, display the results from this
dataset
USArrests

g) You will notice that the first column does not have a column name, we can use tibble to
assign a column name to the index column (1st column) we can then use this to visualise
data. Install the tidytext, tibble and rlang package and load all of them.

#install tidytext package to use tibble function


install.packages("tidytext")
install.packages("tibble")
install.packages("rlang")

#load library
libary(tidytext)
libary(tibble)
libary(rlang)

h) Run the following statement which will now apply the name state to the first column
USArrests <- tibble::as_tibble(USArrests, rownames = "State")
i) Check the first column now shows as State for the column name. To remove this index use
remove(USArrests) this puts the dataset back into its original format. To show all rows you
can use the command - print(USArrests, n = 50).
USArrests

j) You are now going to use the following code to show a dot plot to display the number of
assaults per state, we will be doing more with ggplot next week.
ggplot(USArrests, aes(x = Assault, y = reorder(State, Assault))) +
geom_point(color = "red") +
labs(title = "Assaults by State") +
theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
theme(plot.subtitle = element_text(hjust = 0.5))
k) Change the code above to display a dot graph to display the number of rapes per state
ggplot(USArrests, aes(x = Rape, y = reorder(State, Rape))) +
geom_point(color = "red") +
labs(title = "Rape by state") +
theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
theme(plot.subtitle = element_text(hjust = 0.5))

l) We can use built in functions such as sum, min, mean etc to perform further calculations to
them visualise patterns. There are various ways of how these built-in functions can be used
within R code. We are going to do it in stages to help you understand

1) We are going to create 4 variables to calculate the total number of arrest types using the
USArrests, execute the following code

murder <- sum(USArrests$Murder)


rape <- sum(USArrests$Rape)
assault <- sum(USArrests$Assault)
urbanpop <- sum(USArrests$UrbanPop)

2) Run each variable to see the value

3) We are now going to create 2 vectors, one for the headings and one for the totals

arrest_type <- c("Murder", "Rape", "Assault", "Urban Pop")


arrest_total <- c(murder, rape, assault, urbanpop)

Display your vectors to check they have been created

4) Use the code below to create a simple bar chart to display the list values

barplot(arrest_total, names.arg=arrest_type, main = "Arrest Types",


xlab="Assault Type", ylab="Assault Total")

What conclusions can we draw from the graph?

5) We could quicken the process above by creating a data frame which includes the
headings and summed values – write the code to create the data frame, look back over
the BMI example on slide 23 from last week presentation (“introduction to R”)
arrest_types <- data.frame(types = c("Murder", "Rape", "Assault",
"Urban Pop"),
arrest_totals = c(sum(USArrests$Murder), sum(USArrests$Rape),
sum(USArrests$Assault), sum(USArrests$UrbanPop)))

6) Display the dataframe, it should look like the one below:

arrest_types

#round
arrest_types <- data.frame(types = c("Murder", "Rape", "Assault",
"Urban Pop"),
arrest_totals = c(round(sum(USArrests$Murder)),
round(sum(USArrests$Rape)), round(sum(USArrests$Assault)),
round(sum(USArrests$UrbanPop))))

7) Use qplot, display the following graph using your data frame from question l5)

qplot(data = arrest_types, x = types, y=arrest_totals, size = I(3),


colour = I("Red"))

m) Find the lowest value for murder?


min(USArrests$Murder)
n) Can you show the name of the State with the lowest murder rate, research the function
‘which.min’

USArrests[which.min(USArrests$Murder),"State"]
returns
#A tibble: 1 × 1
State
<chr>
1 North Dakota

USArrests[which.min(USArrests$Murder),]
#
# A tibble: 1 × 5
State Murder Assault UrbanPop Rape
<chr> <dbl> <int> <int> <dbl>
1 North Dakota 0.8 45 44 7.3

rownames(USArrests)[which.min(USArrests$Murder)] returns 34 as included an


index name of State using tibble and North Dakota is on row 34
[1] "34"
#
#remove dataset and use original - returns "North Dakota"
rownames(USArrests)[which.min(USArrests$Murder)]
[1] "North Dakota"

o) Write the code to find the average murder, assault, rape and urbanpop from the dataset it
should show the following results:

arrest_types1 <- data.frame(types = c("Murder", "Rape", "Assault",


"Urban Pop"),
arrest_average = c(mean(USArrests$Murder), mean(USArrests$Rape),
mean(USArrests$Assault), mean(USArrests$UrbanPop)))

p) Display the results from question 2o in a graph, it should look similar to below:
qplot(data = arrest_types1, x = types, y=arrest_average, size =
I(5), colour = I("blue"))

Task 3

Any remaining time, you can work on your assignment

You might also like