Basic R
Basic R
Arithmetic with R
In its most basic form, R can be used as a simple calculator. Consider the following
arithmetic operators:
Addition: +
Subtraction: -
Multiplication: *
Division: /
Exponentiation: ^
Modulo: %% tìm số dư của phép chia 2 số
The ^ operator raises the number to its left to the power of the number to its
right: for example 3^2 is 9.
The modulo returns the remainder of the division of the number to the left by
the number on its right, for example 5 modulo 3 or 5 %% 3 is 2.
Instructions
Submit the answer and have a look at the R output in the console.
Note how the # symbol is used to add comments on the R code
Variable assignment
A basic concept in (statistical) programming is called a variable.
my_var <- 4
Type the following code in the editor: my_apples <- 5. This will assign the
value 5 to my_apples.
Type: my_apples below the second comment. This will print out the value
of my_apples.
Submit your answer, and look at the output: you see that the number 5 is
printed. So R now links the variable my_apples to the value 5.
Note how the quotation marks in the editor indicate that "some text" is a string.
Instruction:
my_numeric variable to 42.
my_character variable to "universe". Note that the quotation marks indicate
that "universe" is a character.
my_logical variable to FALSE.
Create a vector
Do you still remember what you have learned in the first chapter? Assign the
value "Go!" to the variable vegas. Remember: R is case sensitive!
On your way from rags to riches, you will make extensive use of vectors. Vectors are
one-dimension arrays that can hold numeric data, character data, or logical data. In
other words, a vector is a simple tool to store data. For example, you can store your
daily gains and losses in the casinos.
In R, you create a vector with the combine function c(). You place the vector
elements separated by a comma between the parentheses. For example:
numeric_vector <- c(1, 2, 3)
character_vector <- c("a", "b", "c")
boolean_vector <- C(TRUE, FALSE, TRUE)
Once you have created these vectors in R, you can use them to do calculations.
Boolean vector can be used to extract specific elements of a vector that meet a
certain criteria
Instructions
100 XP
Complete the code such that boolean_vector contains the three
elements: TRUE, FALSE and TRUE (in that order).
Before doing a first analysis, you decide to first collect all the winnings and losses for
the last week:
For poker_vector:
For roulette_vector:
You only played poker and roulette, since there was a delegation of mediums that
occupied the craps tables. To be able to use this data in R, you decide to create the
variables poker_vector and roulette_vector.
Instructions
100 XP
Naming a vector
As a data analyst, it is important to have a clear view on the data that you are using.
Understanding what each element refers to is therefore essential.
In the previous exercise, we created a vector with your winnings over the week. Each vector
element refers to a day of the week but it is hard to tell which element belongs to which day.
It would be nice if you could show that in the vector itself.
You can give a name to the elements of a vector with the names() function. Have a look at
this example:
some_vector <- c("John Doe", "poker player")
names(some_vector) <- c("Name", "Profession")
This code first creates a vector some_vector and then gives the two elements a name. The
first element is assigned the name Name, while the second element is labeled Profession.
Printing the contents to the console yields following output:
Name Profession
"John Doe" "poker player"
Instructions
100 XP
The code in the editor names the elements in poker_vector with the days of the
week. Add code to do the same thing for roulette_vector.
In the previous exercises you probably experienced that it is boring and frustrating to
type and retype information such as the days of the week. However, when you look
at it from a higher perspective, there is a more efficient way to do this, namely, to
assign the days of the week vector to a variable!
Just like you did with your poker and roulette returns, you can also create a variable
that contains the days of the week. This way you can use and re-use it.
Instructions
100 XP
How much has been your overall profit or loss per day of the week?
Have you lost money over the week in total?
Are you winning/losing money on poker or on roulette?
It is important to know that if you sum two vectors in R, it takes the element-wise sum. For
example, the following three statements are completely equivalent:
c(1, 2, 3) + c(4, 5, 6)
c(1 + 4, 2 + 5, 3 + 6)
c(5, 7, 9)
You can also do the calculations with variables that represent vectors:
a <- c(1, 2, 3)
b <- c(4, 5, 6)
c <- a + b
Instructions
100 XP
A function that helps you to answer this question is sum(). It calculates the sum of all
elements of a vector. For example, to calculate the total amount of money you have lost/won
with poker you do:
total_poker <- sum(poker_vector)
Instructions
100 XP
Calculate the total amount of money that you have won/lost with roulette and assign
to the variable total_roulette.
Now that you have the totals for roulette and poker, you can easily
calculate total_week (which is the sum of all gains and losses of the week).
Print out total_week.
# Total winnings with poker
total_poker <- sum(poker_vector)
# Total winnings with roulette
total_roulette <- sum(roulette_vector)
# Total winnings overall
total_week <- total_poker + total_roulette
# Print out total_week
total_week
After a short brainstorm in your hotel's jacuzzi, you realize that a possible
explanation might be that your skills in roulette are not as well developed as your
skills in poker. So maybe your total gains in poker are higher (or > ) than in roulette.
Instructions
100 XP
To select multiple elements from a vector, you can add square brackets at the end of it. You
can indicate between the brackets what elements should be selected. For example: suppose
you want to select the first and the fifth day of the week: use the vector c(1, 5) between the
square brackets. For example, the code below selects the first and fifth element
of poker_vector:
poker_vector[c(1, 5)]
Instructions
100 XP
Assign the poker results of Tuesday, Wednesday and Thursday to the
variable poker_midweek.
Just like you did in the previous exercise with numerics, you can also use the
element names to select multiple elements, for example:
poker_vector[c("Monday","Tuesday")]
Instructions
100 XP
As seen in the previous chapter, stating 6 > 5 returns TRUE. The nice thing about R is that
you can use these comparison operators also on vectors. For example:
c(4, 5, 6) > 5
[1] FALSE FALSE TRUE
This command tests for every element of the vector if the condition stated by the comparison
operator is TRUE or FALSE.
Instructions
100 XP
Check which elements in poker_vector are positive (i.e. > 0) and assign this
to selection_vector.
Print out selection_vector so you can inspect it. The printout tells you whether you
won (TRUE) or lost (FALSE) any money for each day.
In the previous exercises you used selection_vector <- poker_vector > 0 to find
the days on which you had a positive poker return. Now, you would like to know not
only the days on which you won, but also how much you won on those days.
You can select the desired elements, by putting selection_vector between the
square brackets that follow poker_vector:
poker_vector[selection_vector]
R knows what to do when you pass a logical vector in square brackets: it will
only select the elements that correspond to TRUE in selection_vector.
Instructions
100 XP
Use selection_vector in square brackets to assign the amounts that you won on the
profitable days to the variable poker_winning_days.
Advanced selection
Just like you did for poker, you also want to know those days where you realized a
positive return for roulette.
Instructions
100 XP
Create the variable selection_vector, this time to see if you made profit with
roulette for different days.
Assign the amounts that you made on the days that you ended positively for
roulette to the variable roulette_winning_days. This vector thus contains the
positive winnings of roulette_vector.
Chapter 3: Matrix
What's a matrix?
In R, a matrix is a collection of elements of the same data type (numeric, character, or
logical) arranged into a fixed number of rows and columns. Since you are only working with
rows and columns, a matrix is called two-dimensional.
The first argument is the collection of elements that R will arrange into the rows and
columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5,
6, 7, 8, 9).
The argument byrow indicates that the matrix is filled by the rows. If we want the
matrix to be filled by the columns, we just place byrow = FALSE.
The third argument nrow indicates that the matrix should have three rows.
Instructions
100 XP
In the editor, three vectors are defined. Each one represents the box office numbers
from the first three Star Wars movies. The first element of each vector indicates the
US box office revenue, the second element refers to the Non-US box office (source:
Wikipedia).
In this exercise, you'll combine all these figures into a single vector. Next, you'll build
a matrix from this vector.
Instructions
100 XP
Instructions
100 XP
Naming a matrix
To help you remember what is stored in star_wars_matrix, you would like to add the
names of the movies for the rows. Not only does this help you to read the data, but it
is also useful to select certain elements from the matrix.
Similar to vectors, you can add names for the rows and the columns of a matrix
To calculate the total box office revenue for the three Star Wars movies, you have to
take the sum of the US revenue column and the non-US revenue column.
# Construct star_wars_matrix
box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
region <- c("US", "non-US")
titles <- c("A New Hope",
"The Empire Strikes Back",
"Return of the Jedi")
Adding a row
Just like every action has a reaction, every cbind() has an rbind(). (We admit, we
are pretty bad with metaphors.)
Your R workspace, where all variables you defined 'live' (check out what a
workspace is), has already been initialized and contains two matrices:
Explore these matrices in the console if you want to have a closer look. If you want
to check out the contents of the workspace, you can type ls() in the console.
Instructions
100 XP
Use rbind() to paste together star_wars_matrix and star_wars_matrix2, in this
order. Assign the resulting matrix to all_wars_matrix.
Calculate the total revenue for the US and the non-US region and
assign total_revenue_vector. You can use the colSums() function.
Print out total_revenue_vector to have a look at the results.
If you want to select all elements of a row or a column, no number is needed before or after
the comma, respectively:
Back to Star Wars with this newly acquired knowledge! As in the previous
exercise, all_wars_matrix is already available in your workspace.
Instructions
100 XP
Select the non-US revenue for all movies (the entire second column
of all_wars_matrix), store the result as non_us_all.
Use mean() on non_us_all to calculate the average non-US revenue for all movies.
Simply print out the result.
This time, select the non-US revenue for the first two movies in all_wars_matrix.
Store the result as non_us_some.
Use mean() again to print out the average of the values in non_us_some.
US non-US
A New Hope 461.0 314.4
The Empire Strikes Back 290.5 247.9
Return of the Jedi 309.3 165.8
The Phantom Menace 474.5 552.5
Attack of the Clones 310.7 338.7
Revenge of the Sith 380.3 468.5
The term factor refers to a statistical data type used to store categorical variables.
The difference between a categorical variable and a continuous variable is that a
categorical variable can belong to a limited number of categories. A continuous
variable, on the other hand, can correspond to an infinite number of values.
A good example of a categorical variable is sex. In many circumstances you can limit
the sex categories to "Male" or "Female". (Sometimes you may need different
categories. For example, you may need to consider chromosomal variation,
hermaphroditic animals, or different cultural norms, but you will always have a finite
number of categories.)
Instructions
100 XP
Assign to variable theory the value "factors".