Introduction to RStudio
MACC7006 Accounting Data and Analytics
Keri Hu
Faculty of Business and Economics
1/26
Today’s objective
By the end of today’s lab, you should be able to:
• Locate and identify the essential parts of the RStudio interface
• Create, edit, and save .R and .RData files
• Generate objects and differentiate between datasets, numbers, strings,
and functions
2/26
RStudio interface
Figure 1: RStudio interface
3/26
Arithmetic operations
R can be used as a calculator:
5 + 3
## [1] 8
5 / 3
## [1] 1.666667
5 ^ 3
## [1] 125
• The [1] is telling you the row number.
4/26
An “object-oriented” programming language
Objects, any pieces of information stored by R, can be:
• A dataset (e.g. “WHO”)
• A subset of a dataset (e.g. only the even observations of “WHO”)
• A number (e.g. 2π ` 1)
• A text string (e.g. “HKU is awesome”)
• A function (e.g. a function that takes in x and gives you x2 ` 8)
5/26
Create objects
R can store objects with a name of our choice. Use <- as an assignment
operator for objects.
object_1 <- 5 + 3
object_1
## [1] 8
If we assign a new value to the same object name, then we will overwrite
this object (so be careful when doing so!)
object_1 <- 5 - 3
object_1
## [1] 2
6/26
Objects (cont.)
R can also represent other types of values as objects, such as strings of
characters:
MySchool <- "HKU"
MySchool
## [1] "HKU"
7/26
A vector stores information in a given order
We use the function c(), which stands for “concatenate,” to enter a data
vector (with commas separating elements of the vector):
vector.1 <- c(93, 92, 83, 99, 96, 97)
vector.1
## [1] 93 92 83 99 96 97
• Note: when creating a vector, R creates column vectors pn ˆ 1q.
8/26
seq function
An easy way to create a long sequence of numbers is the seq function.
• The sequence starts at the first argument, ends at the second
argument, and jumps in increments defined by the third argument.
seq(0, 20, 5)
## [1] 0 5 10 15 20
• If you have 1000 data points, and you want to rank them from 1 to
1000, you can use seq(1, 1000, 1).
9/26
Retrieve part of a vector
To access specific elements of a vector, we use square brackets
[ ]. This is called indexing:
vector.1[2]
## [1] 92
vector.1[c(2, 4)]
## [1] 92 99
vector.1[-4]
## [1] 93 92 83 96 97
10/26
Multiply a vector by a number
Since each element of this vector is a numeric value, we can apply
arithmetic operations to it:
vector.1 * 1000
## [1] 93000 92000 83000 99000 96000 97000
11/26
Element-wise operations of vectors
vec1 <- c(1, 2, 3)
vec2 <- c(3, 3, 3)
vec1 + vec2
## [1] 4 5 6
vec1 * vec2
## [1] 3 6 9
vec1 / vec2
## [1] 0.3333333 0.6666667 1.0000000
12/26
Functions
A function takes input object(s) and returns an output object. In R, a
function generally runs as funcname(input). Some basic functions useful
for summarizing data include:
• length(): length of a vector (number of elements)
• min(): minimum value
• max(): maximum value
• range(): range of data
• mean(): mean
• sd(): standard deviation
• sum(): sum
Try these with vector.1
13/26
Functions (cont.)
length(vector.1)
## [1] 6
min(vector.1)
## [1] 83
max(vector.1)
## [1] 99
range(vector.1)
## [1] 83 99
14/26
Functions (cont.)
mean(vector.1)
## [1] 93.33333
sd(vector.1)
## [1] 5.680376
sum(vector.1)
## [1] 560
15/26
R script
• A text file containing a set of commands and comments
Why to use R script? Instead of re-entering codes each time to execute a
set of commands, . . .
• Reproducibility
• Anyone anywhere with data and R script can produce the results.
• Big time savings when repeating analysis on data
16/26
Create an R script
17/26
Specify a working directory in R
Working directory: the default location where R searches for files and
where it saves files
• Use the function setwd() to change the working directory
setwd("/Users/Keri/MACC7006")
• Use the function getwd() to display the current working directory.
getwd()
## [1] "/Users/Keri/MACC7006"
18/26
Loading data from working directory
Dataset WHO.csv: recent statistics about 194 countries from the World
Health Organization (WHO)
• For CSV files:
WHO <- read.csv("WHO.csv")
• For RData files:
WHO <- load("WHO.RData")
19/26
Data frames
A data frame is the data structure (we can think of it as an Excel
spreadsheet). Useful functions for data frames include:
• str(): examine structure of the object
• names(): return a vector of variable names
• nrow(): return the number of rows
• ncol(): return the number of columns
• dim(): combine ncol() and nrow() into a vector
• summary(): provide a statistical summary
• head(): displays the first six observations
• tail(): displays the last six observations
• View(): displays the spreadsheet of the entire data frame
Load WHO.csv, assign it to an object called WHO (as we did in the last
page), and try the above functions on this newly created data frame.
20/26
Example of a data frame
Variable 1 Variable 2
Observation 1 Variable 1’s value of Observation 1
Observation 2
Observation 3
21/26
Retrieve part of a data frame: using []
We can retrieve specified observations and variables using brackets [ ]
with a comma in the form [rows, columns]:
WHO[1:3, "Country"]
## [1] "Afghanistan" "Albania" "Algeria"
WHO[1:4, 1]
## [1] "Afghanistan" "Albania" "Algeria" "Andorra"
Observe that “Country” is the first variable in the “WHO” data frame.
22/26
Retrieve part of a data frame: using $
The $ operator is another way to access variables from a data frame:
head(WHO$Country, 5)
## [1] "Afghanistan" "Albania" "Algeria" "Andorra" "Angola"
Note: the “5” after the comma specifies how many observations to display.
23/26
Save R script
24/26
Save objects
When you quit RStudio, you will be asked whether you would like to save
the workspace. You should answer no in general.
• To export CSV:
write.csv(WHO, file = "WHO.csv")
• To export RData:
save(WHO, file = "WHO.RData")
Go ahead and export your data frame as RData.
25/26
Here are the commands/operators we covered today:
• <-
• c(), seq()
• vector[]
• length(), min(), max(), range(), mean(), sd(), sum()
• setwd(), getwd()
• read.csv(), load()
• str(), names(), nrow(), ncol(), dim(), summary(),
head(), tail(), View()
• write.csv(), save()
• $
26/26