0% found this document useful (0 votes)
18 views

R Software - Notes

R is a computer language for statistical computing developed at Bell Laboratories. It was initially written by Ross Ihaka and Robert Gentleman in the mid 1990s. R is open-source software that can be used across Unix, Macintosh, and Windows operating systems. The command line interface is where commands are typed to perform calculations and analyze data in R. Basic operations and creating variables are done through familiar mathematical notation and assignment operators. R handles different data types including numeric, integer, character, logical, and complex values.

Uploaded by

sample survey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

R Software - Notes

R is a computer language for statistical computing developed at Bell Laboratories. It was initially written by Ross Ihaka and Robert Gentleman in the mid 1990s. R is open-source software that can be used across Unix, Macintosh, and Windows operating systems. The command line interface is where commands are typed to perform calculations and analyze data in R. Basic operations and creating variables are done through familiar mathematical notation and assignment operators. R handles different data types including numeric, integer, character, logical, and complex values.

Uploaded by

sample survey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

R Software

Overview of R Language:
1. R is a computer language for statistical computing similar to the S language
developed at Bell Laboratories.
2. The R software was initially written by Ross Ihaka and Robert Gentleman in the
mid 1990s. Since 1997, the R project has been organized by the R Development
core Team.
3. R is open-source software and is part of the GNU project. R is being developed
for the Unix, Macintosh, and Windows families of operating systems.
4. R is excellent software to use while first learning statistics. It provides a coherent,
flexible system for data analysis that can be extended as needed. The open-
source nature of R ensures its availability.

Starting R program:
Windows System:
To begin in Windows, we click on the R icon on the desktop, or find the program
under the start menu. A new window pops up with a command-line subwindow.
Linux System:
For Linux, R is often started simply by typing “R” at a command prompt. When R is
started, a command line and perhaps other things await our usage.
The command line, or console, is where we can interact with R. It looks somethinglike this:

The command prompt, > is where we type commands to be processed by R. This


happens when we hit the ENTER key.
Calculations in R:
The simplest usage of R is performing basic arithmetic, as we would do with a
calculator.
R uses familiar notation for math operations, such as +, −, *, and /. Powers are taken
with ^. As usual, we use parentheses to group operations.
Examples:
>2+2
[1] 4
>2^2
[1] 4
> (1–2) * 3
[1] − 3
> 1–2 * 3
[1] – 5
Note:
The answer to each “question” is printed starting with a [1].

Creating Variables in R
Variables are containers for storing data values.R does not have a command for
declaring a variable. A variable is created the moment you first assign a value to it. To assign
a value to a variable, use the <- sign. To output (or print) the variable value, just type the
variable name:
Example:

> whales <- c(74, 122, 235, 111, 292, 111, 211, 133,156, 79)
> whales
[1] 74 122 235 111 292 111 211 133 156 79
Note 1:
‘=’ is also an assignment operator in R.
>x=2
>x
[1] 2
Assignment with = versus <– Assignment can cause confusion if we are trying to understand
the syntax as a mathematical equation.
If we write x=2x+1 as a mathematical equation, we have a single solution: −1. In R, though,
the same expression, x=2*x+1, is interpreted to assign the value of 2*x+1 to the value of x.
This updates the previous value of x. So if x has a value of 2 prior to this line, it leaves with a
value of 5.
Note 2:
The variable e is not previously assigned, unlike the built-in constant pi
Example:
> pi # pi is a built-in constant
[1] 3.142
> e^2 # e is not
Error: Object “e” not found
Note 3:
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:
1. A variable name must start with a letter and can be a combination of letters, digits,
period(.) and underscore(_). If it starts with period(.), it cannot be followed by a digit.
2. A variable name cannot start with a number or underscore (_)
3. Variable names are case-sensitive (age, Age and AGE are three different variables)
4. Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

Data Types
In programming, data type is an important concept.
Variables can store data of different types, and different types can do different
things.
In R, variables do not need to be declared with any particular type, and can even
change type after they have been set:
Examples:
my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (string)
Basic Data Types
Basic data types in R can be divided into the following types:
numeric - (10.5, 55, 787)
integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
complex - (9 + 3i, where "i" is the imaginary part)
character (string) - ("k", "R is exciting", "FALSE", "11.5")
logical (boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable.
Numbers
There are three number types in R:
1. Numeric
A numeric data type is the most common type in R, and contains any number
with or without a decimal, like: 10.5, 55, 787
2. Integer
Integers are numeric data without decimals. This is used when you are certain
that you will never create a variable that should contain decimals. To create an
integer variable, you must use the letter L after the integer value.
3. Complex
A complex number is written with an "i" as the imaginary part.
Variables of number types are created when you assign a value to them:
Example
x <- 10.5 # numeric
y <- 10L # integer
z <- 1i # complex
Type Conversion
We can convert from one type to another with the following functions:
as.numeric()
as.integer()
as.complex()
Example
x <- 1L # integer
y <- 2 # numeric
# convert from integer to numeric:
a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
Math Functions in R

S. Function Description Example


No

1. abs(x) It returns the absolute value >x<- -4


of input x. >abs(x))
[1] 4

2. sqrt(x) It returns the square root of >x<- 4


input x. >sqrt(x)
[1] 2

3. ceiling(x) It returns the smallest >x<- 4.5


integer which is larger than >ceiling(x)
[1] 5
or equal to x.

4. floor(x) It returns the largest integer, >x<- 2.5


which is smaller than or >floor(x)
[1] 2
equal to x.

5. trunc(x) It returns the truncate value >x<-c(1.2,2.5,8)


of input x. >trunc(x))
[1] 1 2 8

6. round(x, digits=n) It returns round value of >x<- 4.56


input x. >round(x,1)
[1]4.6

7. cos(x), sin(x), tan(x) It returns cos(x), sin(x) value >x<- 4


of input x. >cos(x)
[1] -06536436
>sin(x)
[1] -0.7568025
>tan(x)
[3] 1.157821

8. log(x) It returns natural logarithm >x<- 4


of input x. >log(x)
[1] 1.386294

9. log10(x) It returns common logarithm >x<- 4


of input x. >log10(x)
[1] 0.60206

10. exp(x) It returns exponent. >x<- 4


>exp(x)
[1] 54.59815
Statistical functions:
S. Function Description Example
No

1. mean(x) It is used to find the mean for >a<-c(0:10, 40)


x object >mean(a)
[1] 7.916667

2. sd(x) It returns standard deviation >a<-c(0:10, 40)


of an object. >sd(a)
print(xm)
[1] 10.58694

3. median(x) It returns median. >a<-c(0:10, 40)


>meadian(a)
[1] 5.5

4. quantilie(x, probs) It returns quantile where x is > x<-c(1:10)


the numeric vector whose > quantile(x,0.5)
quantiles are desired and 50%
probs is a numeric vector 5.5
with probabilities in [0, 1]

5. range(x) It returns range. >a<-c(0:10, 40)


>range(a)
[1] 0 40

6. sum(x) It returns sum. >a<-c(0:10, 40)


>sum(a)
[1] 95

7. diff(x, lag=1) It returns differences with >a<-c(0:10, 40)


lag indicating which lag to >diff(a)
[1] 1 1 1 1 1 1 1 1 1 1 30
use.

8. min(x) It returns minimum value. >a<-c(0:10, 40)


>min(a)
[1] 0

9. max(x) It returns maximum value >a<-c(0:10, 40)


>max(a)
[1] 40
Data Structures

Vectors

A vector is simply a list of items that are of the same type. To combine the list of items to a
vector, use the c() function and separate the items by a comma.

Example:

>fruits <- c("banana", "apple", "orange") # vector of strings

>fruits

[1] "banana" "apple" "orange"

> numbers <- c(1, 2, 3) # Vector of numerical values

>numbers

[1] 1 2 3

Functions related to vectors:

1. Length() - function helps to find out the number of items in a vector.


2. Sort() - To sort items in a vector alphabetically or numerically
3. Rep() – To repeat vectors
Example:
>repeat_each <- rep(c(1,2,3), each = 3)
>repeat_each
[1] 1 1 1 2 2 2 3 3 3
4. seq( from= n, to=m, by= k) - To make a sequence from n to m.

Accessing vectors

Examples
>fruits <- c("banana", "apple", "orange", "mango", "lemon")
>fruits[1] # Access the first item (banana)
[1]”banana”
>fruits[c(1, 3)] # Access the first and third item (banana and orange)
[1]”banana” “orange”
> fruits[c(-1)] # Access all items except for the first item
[1] "apple" "orange" "mango" "lemon"
>fruits[1] <- "pear" # Change "banana" to "pear"
[1] “pear” "apple" "orange" "mango" "lemon"

Lists

A list in R can contain many different data types inside it. A list is a collection of data
which is ordered and changeable.

list()- function is used to create a list.


Functions related to List()

1. length()- to find out the total numbers of items in the list


2. append() – to add items in the list
3. %in% - To find out if a specified item is present in a list.

Examples:

>thislist <- list("apple", "banana", "cherry")

>thislist[1] # Accessing 1st item in the list


[1] “apple”
>thislist[1] <- "blackcurrant" # Changing 1st item in the list
> "apple" %in% thislist # Check if "apple" is present in the list
[1]FALSE
> append(thislist, "orange") # Add "orange" to the list
[[1]]
[1] "blackcurrant"
[[2]]
[1] "banana"
[[3]]
[1] "cherry"
[[4]]
[1] "orange"
> append(thislist, "orange", after = 2) # Add "orange" to the list after "banana" (index 2)
> newlist <- thislist[-1] # Removing the 1st item
>list1 <- list("a", "b", "c")
>list2 <- list(1,2,3)
>list3 <- c(list1,list2) #Combining two lists

Matrices
A matrix is a two dimensional data set with columns and rows. A column is a vertical
representation of data, while a row is a horizontal representation of data.
A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to
get the amount of rows and columns:
Examples:
>thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
>thismatrix
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
Arrays
Compared to matrices, arrays can have more than two dimensions.
We can use the array() function to create an array, and the dim parameter to specify the
dimensions.
Example:
>multiarray <- array(c(1:24), dim = c(4, 3, 2))
>multiarray
,,1

[,1] [,2] [,3]


[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

,,2

[,1] [,2] [,3]


[1,] 13 17 21
[2,] 14 18 22
[3,] 15 19 23
[4,] 16 20 24

Data Frames
Data Frames are data displayed in a format as a table. Data Frames can have
different types of data inside it. While the first column can be character, the second and
third can be numeric or logical. However, each column should have the same type of data.

Use the data.frame() function to create a data frame.

Example:

>Dat_Frame <- data.frame ( Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120), Duration = c(60, 30, 45))

>Dat_Frame

Training Pulse Duration

1 Strength 100 60

2 Stamina 150 30

3 Other 120 45
Functions related to Data Frame:

Summary() - to summarize the data from a Data Frame

Example:

>Dat_Frame <- data.frame ( Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120), Duration = c(60, 30, 45))

>summary(Dat_Frame)

Training Pulse Duration

1 Strength 100 60

2 Stamina 150 30

3 Other 120 45

Training Pulse Duration

Other :1 Min. :100.0 Min. :30.0

Stamina :1 1st Qu.:110.0 1st Qu.:37.5

Strength:1 Median :120.0 Median :45.0

Mean :123.3 Mean :45.0

3rd Qu.:135.0 3rd Qu.:52.5

Max. :150.0 Max. :60.0

rbind()- function to add new rows in a Data Frame

Example: >New_row_DF <- rbind(Dat_Frame, c("Strength", 110, 110))

cbind() - function to add new columns in a Data Frame

Example: >New_col_DF <- cbind(Dat_Frame, Steps = c(1000, 6000, 2000))


Factors

Factors are used to categorize data. Examples of factors are:

Demography: Male/Female

Music: Rock, Pop, Classic, Jazz

Training: Strength, Stamina

To create a factor, use the factor() function and add a vector as argument

Example:

>music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

>music_genre

[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz

Levels: Classic Jazz Pop Rock

We can see from the example above that that the factor has four levels (categories):
Classic, Jazz, Pop and Rock. To only print the levels, use the levels() function.
R- Graphics
Plot
The plot() function is used to draw points (markers) in a diagram. The function takes
parameters for specifying points in the diagram.
plot(x1, y1, type="", lwd=, lty=, main="", xlab="", ylab="", col="", cex= )

type- 1 represents line


col- line color
lwd- line width
main – Title of the graph
xlab – x axis title
ylab – y axis title
cex – size of the dots
Available parameter values for lty:
0 removes the line
1 displays a solid line
2 displays a dashed line
3 displays a dotted line
4 displays a "dot dashed" line
5 displays a "long dashed" line
6 displays a "two dashed" line

Example:
>plot(1, 3) # Draw one point in the diagram, at position (1) and position (3).
Parameter 1 specifies points on the x-axis.
Parameter 2 specifies points on the y-axis.
> plot(c(1, 2, 3, 4, 5), c(3, 7, 8, 9, 12)) #Multiple points
 The plot() function also accept other parameters, such as main, xlab and ylab if you want
to customize the graph with a main title and different labels for the x and y-axis.
Example:
> plot(1:10, main="My Graph", xlab="The x-axis", ylab="The y axis")

Note:
 To compare the plot with another plot, use the points() function.
Example:
>x1 <- c(5,7,8,7,2,2,9,4,11,12,9,6)
>y1 <- c(99,86,87,88,111,103,87,94,78,77,85,86)

>x2 <- c(2,2,8,1,15,8,12,9,7,3,11,4,7,14,12)


>y2 <- c(100,105,84,105,90,99,90,95,94,100,79,112,91,80,85)

>plot(x1, y1, main="Observation of Cars", xlab="Car age", ylab="Car speed", col="red",


cex=2)
>points(x2, y2, col="blue", cex=2)
 To display more than one line in a graph, use the plot() function together with the lines()
function.
Example:
>line1 <- c(1,2,3,4,5,10)
>line2 <- c(2,5,7,8,9,10)
>plot(line1, type = "l", col = "blue")
>lines(line2, type="l", col = "red")

Pie Charts
A pie chart is a circular graphical view of data.

pie() -function is used for drawing pie charts.


Example:
>x <- c(10,20,30,40)
>pie(x)
Note:
 The size of each pie is determined by comparing the value with all the other values, by using
this formula: The value divided by the sum of all values: x/sum(x)
 We can change the chart angle by using the property init.angle (ex.: pie(x, init.angle = 90))
Example for detailed pie chart:
>x <- c(10,20,30,40)
>mylabel <- c("Apples", "Bananas", "Cherries", "Dates") # Create a vector of labels
>colors <- c("blue", "yellow", "green", "black") # Create a vector of colors
>pie(x, label = mylabel, main = "Pie Chart", col = colors) # Display the pie chart with colors
>legend("bottomright", mylabel, fill = colors) # Display the explanation box
Note:
Legend property values: bottomright, bottom, bottomleft, left, topleft, top, topright, right, center
Bar Charts
A bar chart uses rectangular bars to visualize data. Bar charts can be displayed horizontally
or vertically. The height or length of the bars are proportional to the values they represent.

barplot() –function is used to draw a vertical bar chart


Example:
>x <- c("A", "B", "C", "D") # x-axis values
>y <- c(2, 4, 6, 8) # y-axis values
>barplot(y, names.arg = x)
Example Explained
 The x variable represents values in the x-axis (A,B,C,D)
 The y variable represents values in the y-axis (2,4,6,8)
 Then we use the barplot() function to create a bar chart of the values
 names.arg defines the names of each observation in the x-axis

Barplot()- parameters:

Col - To change the color of the bars


Density - To change the bar texture
Horizontal- (value :True ) If we want the bars to be displayed horizontally instead of vertically
Width - To change the width of the bars

Regression Line:
We say that variables x and y have a linear relationship in a mathematical sense we mean
that y=mx+b, where m is the slope of the line and b the intercept. We call x the independent variable
and y the dependent one.
Example;
> x<-c(1,2,3,4,5,6)
> y<-c(1,4,27,64,625,216)
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-141.3 85.0 #y=mx+b (b=-141.3 and m=85)

> res=lm(y~x) #assigning lm values to the variable res


> plot(x,y) #plot the scattered points
> abline(res) #used for plotting the regression line

> dt<-data.frame(x,y)
> predict(res,data.frame(x=7)) # Predicting new values
1
453.6667

You might also like