R Installation and Overview
R Installation and Overview
Objective: This course introduces students to R, a widely used statistical programming language.
Studentswilllearntomanipulatedataobjects,producegraphics,analysedata usingcommonstatistical
methods, and generate reproducible statistical reports. Student will also learn data mangling.
Credits:01 L-T–P-J:0–0–2-0
ModuleN
Content LabHo
o.
urs
Module1:IntroductiontoR
IntroductionandinstallationofRandRStudio
I Datatypes,vectors,multidimensionalarray.
Functionsandtheiruse
Visualizationusingggplot2.
Word-CountprogramusingJava
InstallationofVM-WareandCloudera
Module2:Hands-OnMongoDB,Cassandra
II Hands-OnMongoDB:CRUD,Where,Aggregation
24
Hands-OnMongoDB:Projection,Aggregation
Hands-OnCassandraDB:CRUD,Projection
Hands-OnPIG&HIVE
Hands-OnPIG
Hands-OnHIVE
TwitterDataFetchingusingFlume
Reference Books:
PaulTeetor.RCookbook:Provenrecipesfordataanalysis,statistics,and graphics.
O'Reilly Media, Inc.,2011.
NormanMatloff.TheartofRprogramming:Atourofstatisticalsoftwaredesign. No Starch
Press, 2011.
WinstonChang.Rgraphicscookbook.O'ReillyMedia,Inc.,2012.
HadleyWickhamandGarrettGrolemund.Rfordatascience.2016.
PhilSpector.DatamanipulationwithR.SpringerScience&BusinessMedia,2008.
Outcome:Attheendofthecourse,studentisableto:
CO1:ApplyR-Studio,readRdocumentation,andwriteRscripts.
CO2:AnalysethedatausingdataanalyticslatesttoolsbasedonHDFSlikePig,Hive.
CO3:ImplementtheaggregationprojectionondatasetusingCassandra,MongoDB.
CO4:ImplementtheconceptofPIG&HIVEUsingQVERIESONrealworlddata
MappingofCourseOutcomes(COs)withProgramOutcomes(POs)andProgramSpecificOutcomes(PSOs):
COs POs/PSOs
CO1 PO2,PO5/PSO4
CO2 PO1,PO5/PSO3
CO3 PO2,PO5/PSO3
CO4 PO5/PSO4
R - Overview
The core of R is an interpreted computer language which allows branching and looping as
well as modular programming using functions. R allows integration with the procedures
written in the C, C++, .Net, Python or FORTRAN languages for efficiency.
R is freely available under the GNU General Public License, and pre-compiled binary
versions are provided for various operating systems like Linux, Windows and Mac.
R is free software distributed under a GNU-style copy left, and an official part of the GNU
project called GNU S.
Evolution of R
R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland in Auckland, New Zealand. R made its first
appearance in 1993.
A large group of individuals has contributed to R by sending code and bug reports.
Since mid-1997 there has been a core group (the "R Core Team") who can modify the R
source code archive.
Features of R
Installationof R-Studioonwindows:
1. To install R, go to cran.r-project.org
cran.r-project.org
6. Click Next.
7. Select where you would like R to be installed. It will default to your
Program Files on your C Drive. Click Next.
10. Then specify if you want to customized your startup or just use the
defaults. Then click Next.
11. Then you can choose the folder that you want R to be saved within or the
default if the R folder that was created. Once you have finished, click Next.
You can also choose if you do not want a Start Menu folder at the bottom.
12. You can then select additional shortcuts if you would like. Click Next.
13. Click Finish.
16. Once the packet has downloaded, the Welcome to RStudio Setup Wizard
will open. Click Next and go through the installation steps.
17. After the Setup Wizard finishing the installation, RStudio will open.
R - Data Types
Generally, while doing programming in any programming language, you need to use various
variables to store various information. Variables are nothing but reserved memory locations to
store values. This means that, when you create a variable you reserve some space in memory.
You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.
In contrast to other programming languages like C and java in R, the variables are not declared
as some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are −
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
The simplest of these objects is the vector object and there are six data types of these atomic
vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic
vectors.
Live Demo
v <- TRUE
Logical TRUE, FALSE print(class(v))
it produces the following result −
[1] "logical"
Live Demo
v <-23.5
Numeric 12.3, 5, 999 print(class(v))
it produces the following result −
[1] "numeric"
Live Demo
v <-2L
Integer 2L, 34L, 0L print(class(v))
it produces the following result −
[1] "integer"
Live Demo
v <-2+5i
Complex 3 + 2i print(class(v))
it produces the following result −
[1] "complex"
Live Demo
v <-"TRUE"
'a' , '"good", print(class(v))
Character
"TRUE", '23.4' it produces the following result −
[1] "character"
Live Demo
v <-charToRaw("Hello")
"Hello" is stored print(class(v))
Raw
as 48 65 6c 6c 6f it produces the following result −
[1] "raw"
In R programming, the very basic data types are the R-objects called vectors which hold
elements of different classes as shown above. Please note in R the number of classes is not
confined to only the above six types. For example, we can use many atomic vectors and reate
an array whose class will become array.
Vectors
When you want to create vector with more than one element, you should use c() function which
means to combine the elements into a vector.
Live Demo
# Create a vector.
apple<- c('red','green',"yellow")
print(apple)
Vector Creation
Single Element Vector
Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of the
above vector types.
Live Demo
# Atomic vector of type character.
print("abc");
[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f
Live Demo
# Creating a sequence from 5 to 13.
v <-5:13
print(v)
# If the final element specified does not belong to the sequence then it is
discarded.
v <-3.8:11.4
print(v)
[1] 5 6 7 8 9 10 11 12 13
[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8
Live Demo
# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5,9,by=0.4))
[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
The non-character values are coerced to character type if one of the elements is a character.
Live Demo
# The logical and numeric values are converted to characters.
s <-c('apple','red',5,TRUE)
print(s)
Live Demo
# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)
Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided giving the result as a
vector output.
Live Demo
# Create two vectors.
v1 <-c(3,8,4,5,0,11)
v2 <-c(4,11,0,8,1,2)
# Vector addition.
add.result<- v1+v2
print(add.result)
# Vector subtraction.
sub.result<- v1-v2
print(sub.result)
# Vector multiplication.
multi.result<- v1*v2
print(multi.result)
# Vector division.
divi.result<- v1/v2
print(divi.result)
[1] 7 19 4 13 1 13
[1] -1 -3 4 -3 -1 9
[1] 12 88 0 40 0 22
[1] 0.7500000 0.7272727 Inf 0.6250000 0.0000000 5.5000000
Live Demo
v1 <-c(3,8,4,5,0,11)
v2 <-c(4,11)
# V2 becomes c(4,11,4,11,4,11)
add.result<- v1+v2
print(add.result)
sub.result<- v1-v2
print(sub.result)
[1] 7 19 8 16 4 22
[1] -1 -3 0 -6 -4 0
Live Demo
v <-c(3,8,4,5,0,11,-9,304)
[1] -9 0 3 4 5 8 11 304
[1] 304 11 8 5 4 3 0 -9
[1] "Blue" "Red" "violet" "yellow"
[1] "yellow" "violet" "Red" "Blue"
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
Live Demo
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.
Live Demo
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'),nrow=2,ncol=3,byrow= TRUE)
print(M)
Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions.
The array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.
Live Demo
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
,,1
[,1] [,2] [,3]
[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"
,,2
Factors are created using the factor() function. The nlevels functions gives the count of levels.
Live Demo
# Create a vector.
apple_colors<- c('green','green','yellow','red','red','red','green')
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.
Live Demo
# Create the data frame.
BMI <- data.frame(
gender= c("Male","Male","Female"),
height= c(152,171.5,165),
weight= c(81,93,78),
Age=c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −
R - Functions
A function is a set of statements organized together to perform a specific task. R has a large number
of in-built functions and the user can create their own functions.
In R, a function is an object so the R interpreter is able to pass control to the function, along with
arguments that may be necessary for the function to accomplish the actions.
The function in turn performs its task and returns control to the interpreter as well as any result
which may be stored in other objects.
Function Definition
An R function is created by using the keyword function. The basic syntax of an R function
definition is as follows −
Function Components
The different parts of a function are −
Function Name − This is the actual name of the function. It is stored in R environment as
an object with this name.
Arguments − An argument is a placeholder. When a function is invoked, you pass a value
to the argument. Arguments are optional; that is, a function may contain no arguments. Also
arguments can have default values.
Function Body − The function body contains a collection of statements that defines what
the function does.
Return Value − The return value of a function is the last expression in the function body to
be evaluated.
R has many in-built functions which can be directly called in the program without defining them
first. We can also create and use our own functions referred as user defined functions.
Built-in Function
Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(...) etc. They are
directly called by user written programs. You can refer most widely used R functions.
Live Demo
# Create a sequence of numbers from 32 to 44.
print(seq(32,44))
[1] 32 33 34 35 36 37 38 39 40 41 42 43 44
[1] 53.5
[1] 1526
User-defined Function
We can create user-defined functions in R. They are specific to what a user wants and once created
they can be used like the built-in functions. Below is an example of how a function is created and
used.
Calling a Function
Live Demo
# Create a function to print squares of numbers in sequence.
new.function<-function(a){
for(i in1:a){
b <- i^2
print(b)
}
}
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
Live Demo
# Create a function with arguments.
new.function<-function(a,b,c){
result<- a * b + c
print(result)
}
[1] 26
[1] 58
Live Demo
# Create a function with arguments.
new.function<-function(a =3, b =6){
result<- a * b
print(result)
}
[1] 18
[1] 45
Live Demo
# Create a function with arguments.
new.function<-function(a, b){
print(a^2)
print(a)
print(b)
}
[1] 36
[1] 6
Error in print(b) : argument "b" is missing, with no default