0% found this document useful (0 votes)
16 views

Week 1-3

The document discusses R and RStudio. R is a programming language widely used for data analysis and RStudio is a popular integrated development environment for R. The document then covers topics such as installing R and RStudio, entering commands, objects, data structures, functions, and packages in R.

Uploaded by

jasmyne
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Week 1-3

The document discusses R and RStudio. R is a programming language widely used for data analysis and RStudio is a popular integrated development environment for R. The document then covers topics such as installing R and RStudio, entering commands, objects, data structures, functions, and packages in R.

Uploaded by

jasmyne
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

© 2023 University of Waikato

Pei-Chi Kelly Hsiao

R
BUSAN302
Week 1
Learning Objectives
Demonstrate basic to advanced competencies in R and Power
BI

○ R and RStudio
○ Entering commands
○ Object, data structure, mode and class
○ Functions and packages
○ Organisation

2
R
R is a programming language and open-source software
widely used for data analysis and data mining

Benefits of using R over other ‘point and click’ statistical


software:
○ Clarifies the analysis process
○ Results can be reproduced and automatically updated
○ More than 10,000 packages to extend its base capabilities
○ Saves time

3
R and RStudio
R RStudio
• Popular Integrated Development Environment
(IDE) for R
• Can use R with a separate script editor as
alternative

Workplace/History
Source Shows objects created,
Create/open R Script or R history of commands,
Markdown file to keep a record of connect to data sources
your work

File > New File > R Script or R


Markdown

Files/Plots/Packages
Shows files in working
directory, history of plots
Console created, packages installed
For typing and executing R
commands

• Only need RStudio open to use R (i.e., no


need to separately run R as well)
4
Installation on PC/Laptop
Download and install R first: https://cran.rstudio.com/

Select link based on


your operating system

R for Windows R for MacOS


Click

Execute installation file > Choose default options Execute installation file > Choose default options
for all questions for all questions
5
Installation on PC/Laptop
Then download and install RStudio:
https://posit.co/download/rstudio-desktop/

If the automatic recommendation is


incorrect, scroll down to see ‘all installers’
and download the appropriate file your OS

Execute installation file > Choose default


options for all questions

6
Entering Commands
Can type directly in the console window and hit Enter to
execute
○ 5+5
○ 2 ^ 10 + 50
○ plot(1:100)

For commands written in a script,


execute by running the code
○ To run one line Ctrl+Enter
○ To run entire script Ctrl+Shift+Enter
7
Object
R operates on objects, everything in R is an object

Create an object by giving it a name and assigning something to it using the


assignment operator (<- or =)
○ a <- 5
b <- 10
c <- “bad apple”

Can perform operations on objects


○ b/a + b
c+a

Useful to know:
Display names of objects ls() Clear console Ctrl+l
Remove objects rm() Clear environment or plotBroom icon
8
Object
Good naming practices
○ Short and informative (e.g., emissions_data instead of df or ced)
○ Use an underscore (_) if it is a name with multiple words (e.g., emissions_data
instead of emissions.data)
○ Cannot start with a number (e.g., 1year is invalid)
○ Avoid non-alphanumeric characters (e.g., +, ‘, -, $, !, @)
○ Do not use the same name as a built-in function (e.g., sum, min, mean)
○ Do not use reserved words (e.g., TRUE, FALSE, NA)

9
Data Structures
Vector and Scalar
Five-element vector
Vector: Series of values that contains the same type of information (i.e.,
same mode).

Assigned using c()


○ ID <- c(1:4)
score <- c(93, 67, 63, 76) score
gender <- c("Female", "Male", "Female", "Male") (four-element vector)

str() provides an overview of the structure of an object and length() returns


93
number of elements in a vector 67
63
76
○ str(ID)
○ length(ID)

Access individual elements or subset using []


One-element vector
○ gender[2]
(scalar)
○ score[2:4]

Scalar: One-element vector (i.e., vector with length of one)

10
Data Structures
Matrix and Array
3 x 4 matrix
Matrix: Two-dimensional vector that contains the same type of information

Created using matrix()


○ matrix(1:12, nrow = 3, ncol = 4)
○ elem <- c(5, 10, 15, 20)
rname <- c("R1", "R2")
cname <- c("C1", "C2")
matrix(elem, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))

Array: Multidimensional matrices that contains the same type of 3 x 4 x 2 array


information

Created using array()


○ array(1:12, c(2, 4, 3))

11
Data Structures
List and Data Frame
List: Ordered collection of objects Vector
Matrix
Created using list()
List Scalar
○ mylist <- list(c(10:5), "Difficult", matrix(1:4, nrow = 2))
Data frame
List

Data frame: Collection of vectors with the same length and each column
4 variables
can contain different types of information

Created using data.frame()

6 observations
○ testdata <- data.frame(ID, score, gender)

12
Mode and Class
The mode describes how an object is stored and determines the type of information
found within an object

Mode Description Check


character Character (e.g., “O”) or string values (e.g., “Orange”) mode("O")
mode("Orange")
numeric Includes integer (whole number, e.g., 777) and double mode(777)
(floating-point number, e.g., 7.77) mode(7.77)
is.integer(7.77)
is.double(7.77)
logical Take on TRUE, FALSE or NA mode(TRUE)

The class determines how an object will be treated by generic functions (functions that
can be applied to different inputs and will produce results depending on the type of input)
○ The class for vectors is the same as its mode. Other classes include matrix, data.frame, function
□ c(class(testdata), mode(testdata))
c(class(score), mode(score))
summary(testdata)
summary(score)

13
Function and Package
data.frame() source code:
function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
Function: Series of instructions that performs a
{
fix.empty.names = TRUE, stringsAsFactors = FALSE)
specific task
data.row.names <- if (check.rows && is.null(row.names))
function(current, new, i) { ○ Executing a function is called ‘calling’ a function or
if (is.character(current))
new <- as.character(new) ‘function call’
if (is.character(new))
current <- as.character(current)
if (anyDuplicated(new)) ○ Always followed by round brackets
return(current)
if (is.null(current))
return(new)
○ Usually require one or more arguments that are
if (all(current == new) || all(current == ""))
return(new)
specified by the user or take on a default value if
stop(gettextf("mismatch of row names in arguments of 'data.frame', item %d",
i), domain = NA)
omitted
}
else function(current, new, i) { ○ Do not need to name arguments if providing
if (is.null(current)) {
if (anyDuplicated(new)) { arguments in the intended order
warning(gettextf("some row.names duplicated: %s --> row.names NOT used",
paste(which(duplicated(new)), collapse = ",")), □ matrix(1:12, 3, 4, TRUE)
domain = NA)
current □ matrix(ncol = 4, data = 1:12, nrow = 3)
}
else new □ matrix(1:12, nrow = 3)
}
else current
}
object <- as.list(substitute(list(...)))[-1L]
mirn <- missing(row.names)
Package: Collection of R functions, datasets
mrn <- is.null(row.names)
x <- list(...) and documentation that extends the capabilities
n <- length(x)
if (n < 1L) { of base R
if (!mrn) {
if (is.object(row.names) || !is.integer(row.names))
row.names <- as.character(row.names)
○ Once downloaded and installed, need to load in
if (anyNA(row.names))
stop("row names contain missing values")
session to use
[… 169 lines in total] 14
Function and Package
Install install.packages()
Update update.packages()
Load library()

Install and use the cowsay package:


○ install.packages("cowsay")
update.packages("cowsay")
○ library(cowsay)
○ say("Hello world!")
say("Hello world!", "random")

Can call a function directly without using library() by stating [package name]::[function name]
cowsay::say("Hello world!“, “random”)

Help file directly in IDE: ?say OR help(say)

Can load, detach or remove packages from ‘Packages’ tab


15
Organisation
Working Directory, Projects and Folder Structure
The working directory is where R will look for your files and save files
Check current working directory getwd()
Set working directory* setwd() OR Session > Set Working Directory > Choose Directory
* Use forward slash (/) instead of backslash (\) for file paths. Backslash is an escape character in R. Can also use double backslash (\\)

Use RStudio Projects for self-contained and portable projects


File > New Project > New Directory > New Project

Be organised and have a clear folder structure. E.g.:

Project #1 data raw data

processed
data

figures

documents

scripts

16
Organisation
Documentation, Style and Other Points
Always write your code in a script and get into the habit of writing easy to
read code
○ Style guide: https://style.tidyverse.org/index.html
○ What matters is consistency and readability!

Other points to note:


○ R is case sensitive
○ Commands are separated by a new line or a semicolon (;)
○ Use a hash symbol (#) to add comments as anything that follows it is ignored
by R
○ If a plus sign (+) appears in the console, it is a prompt that your code is
incomplete
○ Use the escape key (ESC) to terminate current operations

17

You might also like