0% found this document useful (0 votes)
11 views26 pages

R Installation and Overview

This is the R software installation file

Uploaded by

dhruv.rishi.papa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views26 pages

R Installation and Overview

This is the R software installation file

Uploaded by

dhruv.rishi.papa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

BCSE0183:BIGDATAANDANALYTICS LAB

Objective: This course introduces students to R, a widely used statistical programming language.
Studentswilllearntomanipulatedataobjects,producegraphics,analysedata usingcommonstatistical
methods, and generate reproducible statistical reports. Student will also learn data mangling.

Credits:01 L-T–P-J:0–0–2-0

ModuleN
Content LabHo
o.
urs
Module1:IntroductiontoR

 IntroductionandinstallationofRandRStudio
I  Datatypes,vectors,multidimensionalarray.
 Functionsandtheiruse
 Visualizationusingggplot2.
 Word-CountprogramusingJava

 InstallationofVM-WareandCloudera

Module2:Hands-OnMongoDB,Cassandra

II  Hands-OnMongoDB:CRUD,Where,Aggregation
24
 Hands-OnMongoDB:Projection,Aggregation
 Hands-OnCassandraDB:CRUD,Projection

Hands-OnPIG&HIVE
 Hands-OnPIG
 Hands-OnHIVE
 TwitterDataFetchingusingFlume

Reference Books:

 PaulTeetor.RCookbook:Provenrecipesfordataanalysis,statistics,and graphics.
O'Reilly Media, Inc.,2011.
 NormanMatloff.TheartofRprogramming:Atourofstatisticalsoftwaredesign. No Starch
Press, 2011.
 WinstonChang.Rgraphicscookbook.O'ReillyMedia,Inc.,2012.
 HadleyWickhamandGarrettGrolemund.Rfordatascience.2016.
 PhilSpector.DatamanipulationwithR.SpringerScience&BusinessMedia,2008.
Outcome:Attheendofthecourse,studentisableto:
 CO1:ApplyR-Studio,readRdocumentation,andwriteRscripts.
 CO2:AnalysethedatausingdataanalyticslatesttoolsbasedonHDFSlikePig,Hive.
 CO3:ImplementtheaggregationprojectionondatasetusingCassandra,MongoDB.
 CO4:ImplementtheconceptofPIG&HIVEUsingQVERIESONrealworlddata

MappingofCourseOutcomes(COs)withProgramOutcomes(POs)andProgramSpecificOutcomes(PSOs):

COs POs/PSOs
CO1 PO2,PO5/PSO4
CO2 PO1,PO5/PSO3
CO3 PO2,PO5/PSO3
CO4 PO5/PSO4

R - Overview

R is a programming language and software environment for statistical analysis, graphics


representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed by the R Development
Core Team.

The core of R is an interpreted computer language which allows branching and looping as
well as modular programming using functions. R allows integration with the procedures
written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

R is freely available under the GNU General Public License, and pre-compiled binary
versions are provided for various operating systems like Linux, Windows and Mac.

R is free software distributed under a GNU-style copy left, and an official part of the GNU
project called GNU S.

Evolution of R

R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland in Auckland, New Zealand. R made its first
appearance in 1993.

 A large group of individuals has contributed to R by sending code and bug reports.
 Since mid-1997 there has been a core group (the "R Core Team") who can modify the R
source code archive.

Features of R

As stated earlier, R is a programming language and software environment for statistical


analysis, graphics representation and reporting. The following are the important features of
R−

 R is a well-developed, simple and effective programming language which includes


conditionals, loops, user defined recursive functions and input and output facilities.
 R has an effective data handling and storage facility,
 R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
 R provides a large, coherent and integrated collection of tools for data analysis.
 R provides graphical facilities for data analysis and display either directly at the computer or
printing at the papers.
As a conclusion, R is world’s most widely used statistics programming language. It's the # 1
choice of data scientists and supported by a vibrant and talented community of contributors.
R is taught in universities and deployed in mission critical business applications. This
tutorial will teach you R programming along with suitable examples in simple and easy
steps.

Installationof R-Studioonwindows:

1. To install R, go to cran.r-project.org

cran.r-project.org

2. Click Download R for Windows.


3. Install R Click on install R for the first time.

4. Click Download R for Windows. Open the downloaded file.


5. Select the language you would like to use during the installation. Then
click OK.

6. Click Next.
7. Select where you would like R to be installed. It will default to your
Program Files on your C Drive. Click Next.

8. You can then choose which installation you would like.


9. (Optional) If your computer is a 64-bit, you can choose the 64-bit User
Installation. Then click Next.

10. Then specify if you want to customized your startup or just use the
defaults. Then click Next.
11. Then you can choose the folder that you want R to be saved within or the
default if the R folder that was created. Once you have finished, click Next.
You can also choose if you do not want a Start Menu folder at the bottom.

12. You can then select additional shortcuts if you would like. Click Next.
13. Click Finish.

14. Next, download RStudio. Go to https://posit.co/downloads/


https://posit.co/downloads/
15. Click Download RStudio.

16. Once the packet has downloaded, the Welcome to RStudio Setup Wizard
will open. Click Next and go through the installation steps.
17. After the Setup Wizard finishing the installation, RStudio will open.
R - Data Types
Generally, while doing programming in any programming language, you need to use various
variables to store various information. Variables are nothing but reserved memory locations to
store values. This means that, when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.

In contrast to other programming languages like C and java in R, the variables are not declared
as some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are −

 Vectors
 Lists
 Matrices
 Arrays
 Factors
 Data Frames

The simplest of these objects is the vector object and there are six data types of these atomic
vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic
vectors.

Data Type Example Verify

Live Demo
v <- TRUE
Logical TRUE, FALSE print(class(v))
it produces the following result −
[1] "logical"

Live Demo
v <-23.5
Numeric 12.3, 5, 999 print(class(v))
it produces the following result −
[1] "numeric"

Live Demo
v <-2L
Integer 2L, 34L, 0L print(class(v))
it produces the following result −
[1] "integer"

Live Demo
v <-2+5i
Complex 3 + 2i print(class(v))
it produces the following result −
[1] "complex"

Live Demo
v <-"TRUE"
'a' , '"good", print(class(v))
Character
"TRUE", '23.4' it produces the following result −
[1] "character"

Live Demo
v <-charToRaw("Hello")
"Hello" is stored print(class(v))
Raw
as 48 65 6c 6c 6f it produces the following result −
[1] "raw"

In R programming, the very basic data types are the R-objects called vectors which hold
elements of different classes as shown above. Please note in R the number of classes is not
confined to only the above six types. For example, we can use many atomic vectors and reate
an array whose class will become array.

Vectors
When you want to create vector with more than one element, you should use c() function which
means to combine the elements into a vector.

Live Demo
# Create a vector.
apple<- c('red','green',"yellow")
print(apple)

# Get the class of the vector.


print(class(apple))

When we execute the above code, it produces the following result −

[1] "red" "green" "yellow"


[1] "character"
R - Vectors
Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical,
integer, double, complex, character and raw.

Vector Creation
Single Element Vector
Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of the
above vector types.

Live Demo
# Atomic vector of type character.
print("abc");

# Atomic vector of type double.


print(12.5)

# Atomic vector of type integer.


print(63L)

# Atomic vector of type logical.


print(TRUE)

# Atomic vector of type complex.


print(2+3i)

# Atomic vector of type raw.


print(charToRaw('hello'))

When we execute the above code, it produces the following result −

[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f

Multiple Elements Vector


Using colon operator with numeric data

Live Demo
# Creating a sequence from 5 to 13.
v <-5:13
print(v)

# Creating a sequence from 6.6 to 12.6.


v <-6.6:12.6
print(v)

# If the final element specified does not belong to the sequence then it is
discarded.
v <-3.8:11.4
print(v)

When we execute the above code, it produces the following result −

[1] 5 6 7 8 9 10 11 12 13
[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8

Using sequence (Seq.) operator

Live Demo
# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5,9,by=0.4))

When we execute the above code, it produces the following result −

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

Using the c() function

The non-character values are coerced to character type if one of the elements is a character.

Live Demo
# The logical and numeric values are converted to characters.
s <-c('apple','red',5,TRUE)
print(s)

When we execute the above code, it produces the following result −

[1] "apple" "red" "5" "TRUE"


Accessing Vector Elements
Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing
starts with position 1. Giving a negative value in the index drops that element from
result.TRUE, FALSE or 0 and 1 can also be used for indexing.

Live Demo
# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)

# Accessing vector elements using logical indexing.


v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

# Accessing vector elements using negative indexing.


x <- t[c(-2,-5)]
print(x)

# Accessing vector elements using 0/1 indexing.


y <- t[c(0,0,0,0,0,0,1)]
print(y)

When we execute the above code, it produces the following result −

[1] "Mon" "Tue" "Fri"


[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided giving the result as a
vector output.

Live Demo
# Create two vectors.
v1 <-c(3,8,4,5,0,11)
v2 <-c(4,11,0,8,1,2)

# Vector addition.
add.result<- v1+v2
print(add.result)

# Vector subtraction.
sub.result<- v1-v2
print(sub.result)

# Vector multiplication.
multi.result<- v1*v2
print(multi.result)

# Vector division.
divi.result<- v1/v2
print(divi.result)

When we execute the above code, it produces the following result −

[1] 7 19 4 13 1 13
[1] -1 -3 4 -3 -1 9
[1] 12 88 0 40 0 22
[1] 0.7500000 0.7272727 Inf 0.6250000 0.0000000 5.5000000

Vector Element Recycling


If we apply arithmetic operations to two vectors of unequal length, then the elements of the shorter
vector are recycled to complete the operations.

Live Demo
v1 <-c(3,8,4,5,0,11)
v2 <-c(4,11)
# V2 becomes c(4,11,4,11,4,11)

add.result<- v1+v2
print(add.result)

sub.result<- v1-v2
print(sub.result)

When we execute the above code, it produces the following result −

[1] 7 19 8 16 4 22
[1] -1 -3 0 -6 -4 0

Vector Element Sorting


Elements in a vector can be sorted using the sort() function.

Live Demo
v <-c(3,8,4,5,0,11,-9,304)

# Sort the elements of the vector.


sort.result<-sort(v)
print(sort.result)

# Sort the elements in the reverse order.


revsort.result<-sort(v, decreasing = TRUE)
print(revsort.result)

# Sorting character vectors.


v <-c("Red","Blue","yellow","violet")
sort.result<-sort(v)
print(sort.result)

# Sorting character vectors in reverse order.


revsort.result<-sort(v, decreasing = TRUE)
print(revsort.result)

When we execute the above code, it produces the following result −

[1] -9 0 3 4 5 8 11 304
[1] 304 11 8 5 4 3 0 -9
[1] "Blue" "Red" "violet" "yellow"
[1] "yellow" "violet" "Red" "Blue"

Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.

Live Demo
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.


print(list1)

When we execute the above code, it produces the following result −

[1]]
[1] 2 5 3

[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.

Live Demo
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'),nrow=2,ncol=3,byrow= TRUE)
print(M)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] "a" "a" "b"
[2,] "c" "b" "a"

Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions.
The array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.

Live Demo
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

When we execute the above code, it produces the following result −

,,1
[,1] [,2] [,3]
[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

,,2

[,1] [,2] [,3]


[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"
Factors
Factors are the r-objects which are created using a vector. It stores the vector along with the
distinct values of the elements in the vector as labels. The labels are always character
irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are
useful in statistical modeling.

Factors are created using the factor() function. The nlevels functions gives the count of levels.

Live Demo
# Create a vector.
apple_colors<- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple<-factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))

When we execute the above code, it produces the following result −

[1] green green yellow red redred green


Levels: green red yellow
[1] 3

Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

Live Demo
# Create the data frame.
BMI <- data.frame(
gender= c("Male","Male","Female"),
height= c(152,171.5,165),
weight= c(81,93,78),
Age=c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −

gender height weight Age


1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26

R - Functions
A function is a set of statements organized together to perform a specific task. R has a large number
of in-built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with
arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any result
which may be stored in other objects.

Function Definition
An R function is created by using the keyword function. The basic syntax of an R function
definition is as follows −

function_name<- function(arg_1, arg_2, ...) {


Function body
}

Function Components
The different parts of a function are −

 Function Name − This is the actual name of the function. It is stored in R environment as
an object with this name.
 Arguments − An argument is a placeholder. When a function is invoked, you pass a value
to the argument. Arguments are optional; that is, a function may contain no arguments. Also
arguments can have default values.
 Function Body − The function body contains a collection of statements that defines what
the function does.
 Return Value − The return value of a function is the last expression in the function body to
be evaluated.
R has many in-built functions which can be directly called in the program without defining them
first. We can also create and use our own functions referred as user defined functions.

Built-in Function
Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(...) etc. They are
directly called by user written programs. You can refer most widely used R functions.

Live Demo
# Create a sequence of numbers from 32 to 44.
print(seq(32,44))

# Find mean of numbers from 25 to 82.


print(mean(25:82))

# Find sum of numbers frm 41 to 68.


print(sum(41:68))

When we execute the above code, it produces the following result −

[1] 32 33 34 35 36 37 38 39 40 41 42 43 44
[1] 53.5
[1] 1526

User-defined Function
We can create user-defined functions in R. They are specific to what a user wants and once created
they can be used like the built-in functions. Below is an example of how a function is created and
used.

# Create a function to print squares of numbers in sequence.


new.function<-function(a){
for(i in1:a){
b <- i^2
print(b)
}
}

Calling a Function
Live Demo
# Create a function to print squares of numbers in sequence.
new.function<-function(a){
for(i in1:a){
b <- i^2
print(b)
}
}

# Call the function new.function supplying 6 as an argument.


new.function(6)

When we execute the above code, it produces the following result −

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36

Calling a Function without an Argument


Live Demo
# Create a function without an argument.
new.function<-function(){
for(i in1:5){
print(i^2)
}
}

# Call the function without supplying an argument.


new.function()

When we execute the above code, it produces the following result −

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25

Calling a Function with Argument Values (by position and by name)


The arguments to a function call can be supplied in the same sequence as defined in the function or
they can be supplied in a different sequence but assigned to the names of the arguments.

Live Demo
# Create a function with arguments.
new.function<-function(a,b,c){
result<- a * b + c
print(result)
}

# Call the function by position of arguments.


new.function(5,3,11)

# Call the function by names of the arguments.


new.function(a =11, b =5, c =3)

When we execute the above code, it produces the following result −

[1] 26
[1] 58

Calling a Function with Default Argument


We can define the value of the arguments in the function definition and call the function without
supplying any argument to get the default result. But we can also call such functions by supplying
new values of the argument and get non default result.

Live Demo
# Create a function with arguments.
new.function<-function(a =3, b =6){
result<- a * b
print(result)
}

# Call the function without giving any argument.


new.function()

# Call the function with giving new values of the argument.


new.function(9,5)

When we execute the above code, it produces the following result −

[1] 18
[1] 45

Lazy Evaluation of Function


Arguments to functions are evaluated lazily, which means so they are evaluated only when needed
by the function body.

Live Demo
# Create a function with arguments.
new.function<-function(a, b){
print(a^2)
print(a)
print(b)
}

# Evaluate the function without supplying one of the arguments.


new.function(6)

When we execute the above code, it produces the following result −

[1] 36
[1] 6
Error in print(b) : argument "b" is missing, with no default

You might also like