R Language Lab Manual Lab 1
R Language Lab Manual Lab 1
of
Prepared By:
Anju Godara
Assistant Professor
(Computer Science & Engg. Deptt.)
Introduction to R programming:
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R
libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred. R is not only entrusted by academic, but many large companies also use R programming
language, including Uber, Google, Airbnb, Facebook and so on.
Discover: Investigate the data, refine your hypothesis and analyze them
Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny
apps to share with the world
Statistical inference
Data analysis
Machine learning algorithm
Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, goto
download RStudio and click on the download button for RStudio desktop.
Step – 2: Click on the link for the windows version of RStudio and save the .exe file.
Enter/browse the path to the installation folder and click Next to proceed.
Select the folder for the start menu shortcut or click on do not create shortcuts and then
click Next.
Installing Packages:-
The most common place to get packages from is CRAN. To install packages from CRAN
you use install.packages("package name"). For instance, if you want to install the ggplot2
package, which is a very popular visualization package, you would type the following in the
console:-
Syntax:-
# install package from CRAN
install.packages("ggplot2")
Loading Packages:-
Once the package is downloaded to your computer you can access the functions
and resources provided by the package in two different ways:
# load the package to use in the current R session
library(packagename)
Assignment Operators:-
The first operator you’ll run into is the assignment operator. The assignment operator is
used to assign a value. For instance we can assign the value 3 to the variable x using the <-
assignment operator.
# assignment
x <- 3
Interestingly, R actually allows for five assignment operators:
# leftward assignment
x <- value
x = value
x <<- value
#
rightwardassignment
value -> x
value ->> x
The original assignment operator in R was <- and has continued to be the preferred among
R users. The = assignment operator was added in 2001 primarily because it is the accepted
assignment operator in many other languages and beginners to R coming from other
languages were so prone to use it.
The operators <<- is normally only used in functions which we will not get into the details.
Evaluation
We can then evaluate the variable by simply typing x at the command line which will return
the value of x. Note that prior to the value returned you’ll see ## [1] in the command line.
This simply implies that the output returned is the first output. Note that you can type any
comments in your code by preceding the comment with the hash tag (#) symbol. Any
values, symbols, and texts following # will not be evaluated.
# evaluation
x
## [1] 3
Case Sensitivity
Lastly, note that R is a case sensitive programming language. Meaning all
variables, functions, and objects must be called by their exact spelling:
x <- 1
y <- 3
z <- 4
x*y*z
# [1] 12
x*Y*z
# Error in eval(expr, envir, enclos): object 'Y' not found
Basic Arithmetic
At its most basic function R can be used as a calculator. When applying basic arithmetic,
the PEMDAS order of operations applies: parentheses first followed by exponentiation,
multiplication and division, and final addition and subtraction.
8+9/5^2
## [1] 8.36
8+9/(5^2)
## [1] 8.36
8+(9/5)^2
# [1] 11.24
(8+9)/5^2
# [1] 0.68
By default R will display seven digits but this can be changed using options() as
previously outlined.
1/7
# [1] 0.1428571
options(digits = 3)
1/7
# [1] 0.143
pi
# [1] 3.141592654
options(digits = 22)
pi
# [1] 3.141592653589793115998
We can also perform integer divide (%/%) and modulo (%%) functions. The integer divide function will give
the integer part of a fraction while the modulo will provide the remainder.
42 / 4 # regular division
## [1] 10.5
42 %/% 4 # integer division
## [1] 10
42%%4 # modulo (remainder)
## [1] 2
Before we get rolling with the EDA, we want to download our data set. For this
example, we are going to use the dataset produced by this recent science,
technology, art and math (STEAM) project.
Now that we have the data set all loaded, and it’s time to run some very simple
commands to preview the data set and it’s structure.
Head:-
To begin, we are going to run the head function, which allows us to see the first 6 rows by default.
We are going to override the default and ask to preview the first 10 rows.
>head(df, 10)
Tail:-Tail function allows us to see the last n observations from a given data frame. The
defult value for n is 6. User can specify value of n as per as requirements.
>tail(mtcars,n=5)
Next, we will run the dim function which displays the dimensions of the table. The
output takes the form of row, column.And then we run the glimpse function from the
dplyr package. This will display a vertical preview of the dataset. It allows us to easily
preview data type and sample data.
dim(df)
#Displays the type and a preview of all columns as a row so that it's very easy to
take in. library(dplyr)
glimpse(df)
In contrast to other programming languages like C and java in R, the variables are not declared as
some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are –
R Objects:-
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
Vectors:-
R programming, the very basic data types are the R-objects called vectors which hold
elements of different classes as shown above. Please note in R the number of classes is not
confined to only the above six types. For example, we can use many atomic vectors and create
an array whose class will become array.
When you want to create vector with more than one element, you should use c() function which
means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
You can use subscripts to select the specific component of the list.
> x <- list(1:3, TRUE, "Hello", list(1:2, 5))
Here x has 4 elements: a numeric vector, a logical, a string and another list.
"Hello"
> x[c(1,3)]
[[1]]
[1] 1 2
3 [[2]]
[1] "Hello"
We can also name some or all of the entries in our list, by supplying argument names to list().
Matrices:-
Matrices are much used in statistics, and so play an important role in R. To create a matrix
use the function matrix(), specifying elements by column first:
[1,]14710
[2,]25811
[3,]36912
This is called column-major order. Of course, we need only give one of the dimensions:
[1,]1111
[2,]2222
[3,]3333
> diag(3)
[1,]100
[2,]010
[3,]001
> diag(1:3)
[1,]100
[2,]020
[3,]003
[1,]12345
[2,]246810
[3,]3691215
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
The last operator performs an outer product, so it creates a matrix with (i, j)-th entry xiyj .
The function outer() generalizes this to any function f on two arguments, to create a matrix
with entries f(xi , yj ). (More on functions later.)
[1,]2345
[2,]3456
[3,]4567
Array:
If we have a data set consisting of more than two pieces of categorical information about each
subject, then a matrix is not sufficient. The generalization of matrices to higher
dimensions is the array. Arrays are defined much like matrices, with a call to the array()
command. Here is a 2 × 3 × 3 array:
> arr
,,1
[1,]135
[2,]246
,,2
[1,] 7 9 11
[2,] 8 10 12
,,3
[1,] 13 15 17
[2,] 14 16 18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown as a 2 × 3
matrix. Note that we no longer specify the number of rows and columns separately, but use a
single vector dim whose length is the number of dimensions. You can recover this vector
with the dim() function.
> dim(arr)
[1]233
> arr[1,2,3]
[1] 15
> arr[,2,]
[,1] [,2] [,3]
[1,] 3 9 15
[2,] 4 10 16
> arr[,,1,drop=FALSE]
,,1
[1,]035
[2,]246
Factors:-
R has a special data structure to store categorical variables. It tells R that a variable is
nominal or ordinal by making it a factor.
data$x = as.factor(data$x)
Data Frames:-
A data frame is a table or a two-dimensional array-like structure in which each column contains values of
one variable and each row contains one set of values from each column.
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
The structure of the data frame can be seen by using str() function.
1.Add Column:-Just add the column vector using a new column name.
2.Add Row:-To add more rows permanently to an existing data frame, we need to bring in the new rows in the
same structure as the existing data frame and use the rbind() function.
v <- emp.data
print(v)
Charts
A bar chart is a way of summarizing a set of categorical data. It displays data using a number of rectangles
of the same width, each of which represents a particular category. The length of each rectangle is
proportional to the number of cases in the category it represents. Below, we will make a figure of the
question “To the best of your knowledge, what is the leading cause of death for Albertans under the age of
45?”, question g1 in the survey. You can make a bar chart in the R Com!mander by choosing Graphs Bar
Graph from the R Commander menus.
That will open up a dialog box that looks like the following:
You can also choose di↵erent options for how the figure is constructed (e.g., frequencies or
percentages) by clicking on the percentages tab, which switches the dialog to the one below:
That should produce a bar plot that looks like the following:
Looping in R:-
“Looping”, “cycling”, “iterating” or just replicating instructions is an old practice that originated
well before the invention of computers. It is nothing more than automating a multi-step process by
organizing sequences of actions or ‘batch’ processes and by grouping the parts that need to be
repeated.
All modern programming languages provide special constructs that allow for the repetition
of instructions or blocks of instructions.
According to the R base manual, among the control flow commands, the loop constructs
are for, while and repeat, with the additional clauses break and next.Remember that control flow
commands are the commands that enable a program to branch between alternatives, or to “take
decisions”, so to speak.
If the condition is not met and the resulting outcome is False, the loop is never executed. This is
indicated by the loose arrow on the right of the for loop structure. The program will then
execute the first instruction found after the loop block.
If the condition is verified, an instruction -or block of instructions- i1 is executed. And
perhaps this block of instructions is another loop. In such cases, you speak of a nested loop.
The initialization statement describes the starting point of the loop, where the loop variable is
initialized with a starting value. A loop variable or counter is simply a variable that controls
the flow of the loop. The test expression is the condition until when the loop is repeated
Syntax of for loop:-
statement
Here, sequence is a vector and val takes on each of its value during the loop. In each
iteration, statement is evaluated.
x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x) {
if(val %% 2 == 0) count = count+1
print(count)
Output
[1] 3
In the above example, the loop iterates 7 times as the vector x has 7 elements.
In each iteration, val takes on the value of corresponding element of x.
We have used a counter to count the number of even numbers in x. We can see that x contains 3
even numbers.
R while Loop
Loops are used in programming to repeat a specific block of code. In this article, you will learn
to create a while loop in R programming.In R programming, while loops are used to loop until a
specific condition is met.
while (test_expression)
statement
Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to evaluate
the test_expression again.
This is repeated each time until test_expression evaluates to FALSE, in which case, the loop exits.
A bar chart is a pictorial representation of data that presents categorical data with rectangular
bars with heights or lengths proportional to the values that they represent. In other words, it is
the pictorial representation of dataset. These data sets contain the numerical values of variables
that represent the length or height.
R uses the function barplot() to create bar charts. Here, both vertical and Horizontal bars can be
drawn.
Syntax:
barplot(H, xlab, ylab, main, names.arg, col)
Parameters:
H: This parameter is a vector or matrix containing numeric values which are used in bar
chart.
xlab: This parameter is the label for x axis in bar chart.
ylab: This parameter is the label for y axis in bar chart.
main: This parameter is the title of the bar chart.
names.arg: This parameter is a vector of names appearing under each bar in bar chart.
col: This parameter is used to give colors to the bars in the graph.
Output:
Label, title and colors are some properties in the bar chart which can be added to the bar by
adding and passing an argument.
Approach:
1. To add the title in bar chart.
barplot( A, main = title_name )
2. X-axis and Y-axis can be labeled in bar chart. To add the label in bar chart.
barplot( A, xlab= x_label_name, ylab= y_label_name)
3. To add the color in bar chart.
barplot( A, col=color_name)
Example :
Histograms in R language:-
A histogram contains a rectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical intervals. A
graphical representation that manages a group of data points into different specified ranges. It has
a special feature which shows no gaps between the bars and is similar to a vertical bar graph.We
can create histogram in R Programming Language using hist() function.
Creating a simple histogram chart by using the above parameter. This vector v is plot using hist().
Example:
R
Output:
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
Output:
Using histogram return values for labels using text()
R
breaks = 5)
# Setting labels
Boxplots in R Language:-
A box graph is a chart that is used to display information in the form of distribution by
drawing boxplots for each of them. This distribution of data based on five sets (minimum, first
quartile, median, third quartile, maximum).
Boxplots in R Programming Language
Boxplots are created in R by using the boxplot() function.
R
'cyl')] print(head(input))
Output:
Creating the Boxplot
Output:
Scatter plot :-
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of
two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the
plot. Example
We use the data set "mtcars" available in the R environment to create a basic scatterplot. Let's use
the columns "wt" and "mpg" in mtcars.
The lm() function estimates the intercept and slope coefficients for the linear model that it has
fit to our data.
Whether we can use our model to make predictions will depend on:
1. Whether we can reject the null hypothesis that there is no relationship between our
variables.
2. Whether the model is a good fit for our data.
The output of our model using summary(). The model output will provide us with the
information we need to test our hypothesis and assess how well the model fits our data.