Week 1-3
Week 1-3
R
BUSAN302
Week 1
Learning Objectives
Demonstrate basic to advanced competencies in R and Power
BI
○ R and RStudio
○ Entering commands
○ Object, data structure, mode and class
○ Functions and packages
○ Organisation
2
R
R is a programming language and open-source software
widely used for data analysis and data mining
3
R and RStudio
R RStudio
• Popular Integrated Development Environment
(IDE) for R
• Can use R with a separate script editor as
alternative
Workplace/History
Source Shows objects created,
Create/open R Script or R history of commands,
Markdown file to keep a record of connect to data sources
your work
Files/Plots/Packages
Shows files in working
directory, history of plots
Console created, packages installed
For typing and executing R
commands
Execute installation file > Choose default options Execute installation file > Choose default options
for all questions for all questions
5
Installation on PC/Laptop
Then download and install RStudio:
https://posit.co/download/rstudio-desktop/
6
Entering Commands
Can type directly in the console window and hit Enter to
execute
○ 5+5
○ 2 ^ 10 + 50
○ plot(1:100)
Useful to know:
Display names of objects ls() Clear console Ctrl+l
Remove objects rm() Clear environment or plotBroom icon
8
Object
Good naming practices
○ Short and informative (e.g., emissions_data instead of df or ced)
○ Use an underscore (_) if it is a name with multiple words (e.g., emissions_data
instead of emissions.data)
○ Cannot start with a number (e.g., 1year is invalid)
○ Avoid non-alphanumeric characters (e.g., +, ‘, -, $, !, @)
○ Do not use the same name as a built-in function (e.g., sum, min, mean)
○ Do not use reserved words (e.g., TRUE, FALSE, NA)
9
Data Structures
Vector and Scalar
Five-element vector
Vector: Series of values that contains the same type of information (i.e.,
same mode).
10
Data Structures
Matrix and Array
3 x 4 matrix
Matrix: Two-dimensional vector that contains the same type of information
11
Data Structures
List and Data Frame
List: Ordered collection of objects Vector
Matrix
Created using list()
List Scalar
○ mylist <- list(c(10:5), "Difficult", matrix(1:4, nrow = 2))
Data frame
List
Data frame: Collection of vectors with the same length and each column
4 variables
can contain different types of information
6 observations
○ testdata <- data.frame(ID, score, gender)
12
Mode and Class
The mode describes how an object is stored and determines the type of information
found within an object
The class determines how an object will be treated by generic functions (functions that
can be applied to different inputs and will produce results depending on the type of input)
○ The class for vectors is the same as its mode. Other classes include matrix, data.frame, function
□ c(class(testdata), mode(testdata))
c(class(score), mode(score))
summary(testdata)
summary(score)
13
Function and Package
data.frame() source code:
function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
Function: Series of instructions that performs a
{
fix.empty.names = TRUE, stringsAsFactors = FALSE)
specific task
data.row.names <- if (check.rows && is.null(row.names))
function(current, new, i) { ○ Executing a function is called ‘calling’ a function or
if (is.character(current))
new <- as.character(new) ‘function call’
if (is.character(new))
current <- as.character(current)
if (anyDuplicated(new)) ○ Always followed by round brackets
return(current)
if (is.null(current))
return(new)
○ Usually require one or more arguments that are
if (all(current == new) || all(current == ""))
return(new)
specified by the user or take on a default value if
stop(gettextf("mismatch of row names in arguments of 'data.frame', item %d",
i), domain = NA)
omitted
}
else function(current, new, i) { ○ Do not need to name arguments if providing
if (is.null(current)) {
if (anyDuplicated(new)) { arguments in the intended order
warning(gettextf("some row.names duplicated: %s --> row.names NOT used",
paste(which(duplicated(new)), collapse = ",")), □ matrix(1:12, 3, 4, TRUE)
domain = NA)
current □ matrix(ncol = 4, data = 1:12, nrow = 3)
}
else new □ matrix(1:12, nrow = 3)
}
else current
}
object <- as.list(substitute(list(...)))[-1L]
mirn <- missing(row.names)
Package: Collection of R functions, datasets
mrn <- is.null(row.names)
x <- list(...) and documentation that extends the capabilities
n <- length(x)
if (n < 1L) { of base R
if (!mrn) {
if (is.object(row.names) || !is.integer(row.names))
row.names <- as.character(row.names)
○ Once downloaded and installed, need to load in
if (anyNA(row.names))
stop("row names contain missing values")
session to use
[… 169 lines in total] 14
Function and Package
Install install.packages()
Update update.packages()
Load library()
Can call a function directly without using library() by stating [package name]::[function name]
cowsay::say("Hello world!“, “random”)
processed
data
figures
documents
scripts
16
Organisation
Documentation, Style and Other Points
Always write your code in a script and get into the habit of writing easy to
read code
○ Style guide: https://style.tidyverse.org/index.html
○ What matters is consistency and readability!
17