R Software - Notes
R Software - Notes
Overview of R Language:
1. R is a computer language for statistical computing similar to the S language
developed at Bell Laboratories.
2. The R software was initially written by Ross Ihaka and Robert Gentleman in the
mid 1990s. Since 1997, the R project has been organized by the R Development
core Team.
3. R is open-source software and is part of the GNU project. R is being developed
for the Unix, Macintosh, and Windows families of operating systems.
4. R is excellent software to use while first learning statistics. It provides a coherent,
flexible system for data analysis that can be extended as needed. The open-
source nature of R ensures its availability.
Starting R program:
Windows System:
To begin in Windows, we click on the R icon on the desktop, or find the program
under the start menu. A new window pops up with a command-line subwindow.
Linux System:
For Linux, R is often started simply by typing “R” at a command prompt. When R is
started, a command line and perhaps other things await our usage.
The command line, or console, is where we can interact with R. It looks somethinglike this:
Creating Variables in R
Variables are containers for storing data values.R does not have a command for
declaring a variable. A variable is created the moment you first assign a value to it. To assign
a value to a variable, use the <- sign. To output (or print) the variable value, just type the
variable name:
Example:
> whales <- c(74, 122, 235, 111, 292, 111, 211, 133,156, 79)
> whales
[1] 74 122 235 111 292 111 211 133 156 79
Note 1:
‘=’ is also an assignment operator in R.
>x=2
>x
[1] 2
Assignment with = versus <– Assignment can cause confusion if we are trying to understand
the syntax as a mathematical equation.
If we write x=2x+1 as a mathematical equation, we have a single solution: −1. In R, though,
the same expression, x=2*x+1, is interpreted to assign the value of 2*x+1 to the value of x.
This updates the previous value of x. So if x has a value of 2 prior to this line, it leaves with a
value of 5.
Note 2:
The variable e is not previously assigned, unlike the built-in constant pi
Example:
> pi # pi is a built-in constant
[1] 3.142
> e^2 # e is not
Error: Object “e” not found
Note 3:
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:
1. A variable name must start with a letter and can be a combination of letters, digits,
period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.
2. A variable name cannot start with a number or underscore (_)
3. Variable names are case-sensitive (age, Age and AGE are three different variables)
4. Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
Data Types
In programming, data type is an important concept.
Variables can store data of different types, and different types can do different
things.
In R, variables do not need to be declared with any particular type, and can even
change type after they have been set:
Examples:
my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (string)
Basic Data Types
Basic data types in R can be divided into the following types:
numeric - (10.5, 55, 787)
integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
complex - (9 + 3i, where "i" is the imaginary part)
character (string) - ("k", "R is exciting", "FALSE", "11.5")
logical (boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable.
Numbers
There are three number types in R:
1. Numeric
A numeric data type is the most common type in R, and contains any number
with or without a decimal, like: 10.5, 55, 787
2. Integer
Integers are numeric data without decimals. This is used when you are certain
that you will never create a variable that should contain decimals. To create an
integer variable, you must use the letter L after the integer value.
3. Complex
A complex number is written with an "i" as the imaginary part.
Variables of number types are created when you assign a value to them:
Example
x <- 10.5 # numeric
y <- 10L # integer
z <- 1i # complex
Type Conversion
We can convert from one type to another with the following functions:
as.numeric()
as.integer()
as.complex()
Example
x <- 1L # integer
y <- 2 # numeric
# convert from integer to numeric:
a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
Math Functions in R
Vectors
A vector is simply a list of items that are of the same type. To combine the list of items to a
vector, use the c() function and separate the items by a comma.
Example:
>fruits
>numbers
[1] 1 2 3
Accessing vectors
Examples
>fruits <- c("banana", "apple", "orange", "mango", "lemon")
>fruits[1] # Access the first item (banana)
[1]”banana”
>fruits[c(1, 3)] # Access the first and third item (banana and orange)
[1]”banana” “orange”
> fruits[c(-1)] # Access all items except for the first item
[1] "apple" "orange" "mango" "lemon"
>fruits[1] <- "pear" # Change "banana" to "pear"
[1] “pear” "apple" "orange" "mango" "lemon"
Lists
A list in R can contain many different data types inside it. A list is a collection of data
which is ordered and changeable.
Examples:
Matrices
A matrix is a two dimensional data set with columns and rows. A column is a vertical
representation of data, while a row is a horizontal representation of data.
A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to
get the amount of rows and columns:
Examples:
>thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
>thismatrix
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Arrays
Compared to matrices, arrays can have more than two dimensions.
We can use the array() function to create an array, and the dim parameter to specify the
dimensions.
Example:
>multiarray <- array(c(1:24), dim = c(4, 3, 2))
>multiarray
,,1
,,2
Data Frames
Data Frames are data displayed in a format as a table. Data Frames can have
different types of data inside it. While the first column can be character, the second and
third can be numeric or logical. However, each column should have the same type of data.
Example:
>Dat_Frame
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Functions related to Data Frame:
Example:
>summary(Dat_Frame)
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
Demography: Male/Female
To create a factor, use the factor() function and add a vector as argument
Example:
>music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
>music_genre
We can see from the example above that that the factor has four levels (categories):
Classic, Jazz, Pop and Rock. To only print the levels, use the levels() function.
R- Graphics
Plot
The plot() function is used to draw points (markers) in a diagram. The function takes
parameters for specifying points in the diagram.
plot(x1, y1, type="", lwd=, lty=, main="", xlab="", ylab="", col="", cex= )
Example:
>plot(1, 3) # Draw one point in the diagram, at position (1) and position (3).
Parameter 1 specifies points on the x-axis.
Parameter 2 specifies points on the y-axis.
> plot(c(1, 2, 3, 4, 5), c(3, 7, 8, 9, 12)) #Multiple points
The plot() function also accept other parameters, such as main, xlab and ylab if you want
to customize the graph with a main title and different labels for the x and y-axis.
Example:
> plot(1:10, main="My Graph", xlab="The x-axis", ylab="The y axis")
Note:
To compare the plot with another plot, use the points() function.
Example:
>x1 <- c(5,7,8,7,2,2,9,4,11,12,9,6)
>y1 <- c(99,86,87,88,111,103,87,94,78,77,85,86)
Pie Charts
A pie chart is a circular graphical view of data.
Barplot()- parameters:
Regression Line:
We say that variables x and y have a linear relationship in a mathematical sense we mean
that y=mx+b, where m is the slope of the line and b the intercept. We call x the independent variable
and y the dependent one.
Example;
> x<-c(1,2,3,4,5,6)
> y<-c(1,4,27,64,625,216)
> lm(y~x)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-141.3 85.0 #y=mx+b (b=-141.3 and m=85)
> dt<-data.frame(x,y)
> predict(res,data.frame(x=7)) # Predicting new values
1
453.6667