Week 11 Tasks and Solutions
Week 11 Tasks and Solutions
a) Save both the iris and demographic data to a folder in your M drive
e) Using the factor command investigate how many different species of iris flowers
levels(factor(dataset$species))
f) Create a dataframe “newdataset” to show all data for columns petal length and sepal length
only
newdataset <- data.frame(dataset$sepal_length, dataset$petal_length)
g) Install and load the dplyr package and run the following statement: what does this show?
install.packages("dplyr")
library(dplyr)
select(dataset, sepal_width, petal_width)
h) Assign a variable name “widthonly” to the statement in question g, run the variable to
display the results
widthonly <- select(dataset, sepal_width, petal_width)
i) Create a new column in the iris dataset to show the area for the petal of each flower (petal
length * petal width) – ignore any error messages, now display the results of the dataset
showing the new column, it should look like this:
j) Show all petal areas which are greater than or equal to 8.64 – assign a variable and run it
petalfilter <- dataset$petal_area >= 8.64
dataset[petalfilter,]
k) Create a new variable to extend question j to display petal areas which are greater than or
equal to 8.64 and less than 12, run your new variable
petalfilter2 <- dataset$petal_area >= 8.64 & dataset$petal_area < 12
dataset[petalfilter2,]
Task 2 – Visualisation
b) create a variable “mydata” to read the demographic dataset and display all data
mydata <- read.csv("Demographic-Data.csv")
mydata
c) Investigate the demographic data set using functions such as head, str, summary
head(mydata)
str(mydata)
summary(mydata)
d) Run each of the statements on slide 16 and 17 and observe the results
e) Follow and carry out the instructions from slides 18 – 23 – explore the different shapes
available.
f) You are now going to work with the USArrests built in dataset, display the results from this
dataset
USArrests
g) You will notice that the first column does not have a column name, we can use tibble to
assign a column name to the index column (1st column) we can then use this to visualise
data. Install the tidytext, tibble and rlang package and load all of them.
#load library
libary(tidytext)
libary(tibble)
libary(rlang)
h) Run the following statement which will now apply the name state to the first column
USArrests <- tibble::as_tibble(USArrests, rownames = "State")
i) Check the first column now shows as State for the column name. To remove this index use
remove(USArrests) this puts the dataset back into its original format. To show all rows you
can use the command - print(USArrests, n = 50).
USArrests
j) You are now going to use the following code to show a dot plot to display the number of
assaults per state, we will be doing more with ggplot next week.
ggplot(USArrests, aes(x = Assault, y = reorder(State, Assault))) +
geom_point(color = "red") +
labs(title = "Assaults by State") +
theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
theme(plot.subtitle = element_text(hjust = 0.5))
k) Change the code above to display a dot graph to display the number of rapes per state
ggplot(USArrests, aes(x = Rape, y = reorder(State, Rape))) +
geom_point(color = "red") +
labs(title = "Rape by state") +
theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
theme(plot.subtitle = element_text(hjust = 0.5))
l) We can use built in functions such as sum, min, mean etc to perform further calculations to
them visualise patterns. There are various ways of how these built-in functions can be used
within R code. We are going to do it in stages to help you understand
1) We are going to create 4 variables to calculate the total number of arrest types using the
USArrests, execute the following code
3) We are now going to create 2 vectors, one for the headings and one for the totals
4) Use the code below to create a simple bar chart to display the list values
5) We could quicken the process above by creating a data frame which includes the
headings and summed values – write the code to create the data frame, look back over
the BMI example on slide 23 from last week presentation (“introduction to R”)
arrest_types <- data.frame(types = c("Murder", "Rape", "Assault",
"Urban Pop"),
arrest_totals = c(sum(USArrests$Murder), sum(USArrests$Rape),
sum(USArrests$Assault), sum(USArrests$UrbanPop)))
arrest_types
#round
arrest_types <- data.frame(types = c("Murder", "Rape", "Assault",
"Urban Pop"),
arrest_totals = c(round(sum(USArrests$Murder)),
round(sum(USArrests$Rape)), round(sum(USArrests$Assault)),
round(sum(USArrests$UrbanPop))))
7) Use qplot, display the following graph using your data frame from question l5)
USArrests[which.min(USArrests$Murder),"State"]
returns
#A tibble: 1 × 1
State
<chr>
1 North Dakota
USArrests[which.min(USArrests$Murder),]
#
# A tibble: 1 × 5
State Murder Assault UrbanPop Rape
<chr> <dbl> <int> <int> <dbl>
1 North Dakota 0.8 45 44 7.3
o) Write the code to find the average murder, assault, rape and urbanpop from the dataset it
should show the following results:
p) Display the results from question 2o in a graph, it should look similar to below:
qplot(data = arrest_types1, x = types, y=arrest_average, size =
I(5), colour = I("blue"))
Task 3