0% found this document useful (0 votes)
101 views

Big Data File in R

Further experiments cover reading in and working with data from CSV files in R, fitting linear regression models, and saving plot files.

Uploaded by

Prabhu Goyal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Big Data File in R

Further experiments cover reading in and working with data from CSV files in R, fitting linear regression models, and saving plot files.

Uploaded by

Prabhu Goyal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

BIG DATA ANALYTICS

Laboratory File

Submitted to: Ms Rachna Bhel Submitted by- Prabhu Goyal


15/FET/CS(L)/1006
Experiment-1

AIM- Introduction to R, Introduction to different types of data types in R.

R is a language and environment for statistical computing and graphics.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical
tests, time-series analysis, classification, clustering,) and graphical techniques, and is highly
extensible."

One of R's strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed.

R is an interpreted language; users typically access it through a command-line interpreter. If


a user types 2+2 at the R command prompt and presses enter, the computer replies with 4.

R supports procedural programming with functions and, for some functions, object-
oriented programming with generic functions. A generic function acts differently depending
on the classes of arguments passed to it. In other words, the generic
function dispatches the function (method) specific to that class of object.

The frequently used data types are


Vectors
Lists
Matrices
Arrays
Factors

Vectors
When you want to create vector with more than one element, you should use c() function
which means to combine the elements into a vector.
Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input
to the matrix function.
Arrays
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required number of
dimension.
Factors
Factors are the r-objects which are created using a vector. It stores the vector along with
the distinct values of the elements in the vector as labels. The labels are always character
irrespective of whether it is numeric or character or Boolean etc. in the input vector. They
are useful in statistical modelling.
Factors are created using the factor() function. The nlevels functions gives the count of
levels.
Experiment-2
AIM: Write program to print elementary operation in R.
x <- 10
x
class(x)
is.integer(x)
x<- as.integer(3.8)
x
class(x)
is.integer(x)
x=1
y=4
z=x>y
class(z)
is.integer(z)
x= "vishesh"
class(c)
Outputs
Experiment-3
AIM: Write a program to determine various control statements in R.

If else statement:
x=10;
if(x>1){
print("x is greater than 1")
}else{
print("x is less than 1")
}

For loop:
x = c(1,2,3,4,5)
for(i in 0:5){
print(x[i])
}
While Loop:
x = 2.987
while(x <= 4.987) {
x = x + 0.987
print(c(x,x-2,x-1))
}

Repeat loop:
a=1
repeat { print(a)
a = a+1
if(a > 4)
break }
Experiment-4
AIM: Write a program to create list & vector in R.
List in R
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

Vector in R:
print("abc");
# Atomic vector of type double.
print(12.5)
# Atomic vector of type integer.
print(63L)
# Atomic vector of type logical.
print(TRUE)
# Atomic vector of type complex.
print(2+3i)
# Atomic vector of type raw.
print(charToRaw('hello'))
Experiment-5
AIM: Create a vector from 1 to 5 in increments of 0.2 by using sequence.

x = 1:30
x
x = seq(2, 8, 0.5)
x
x= seq(2,10, 2)
x
x = 5/0
x
Experiment-6

AIM: Generate a vector of 5000 random numbers from uniform distribution, with
mean=3, and standard deviation=2, use the function mean, sd, to compute the
sample mean and standard deviation of the values in the vector. Visualize this
distribution using hist(to generate the histogram).
n = rnorm (5000, 3, 2)
mean(n)
sd(n)
hist(n)
Experiment-7
AIM: To print Matrix, Array, Strings in R.
A = matrix(
c(2, 4, 3, 1, 5, 7), # the data elements
nrow=2, # number of rows
ncol=3, # number of columns
byrow = TRUE) # fill matrix by rows
A

String
a <- 'Start and end with single quote'
print(a)
b <- "Start and end with double quotes"
print(b)
c <- "single quote ' in between double quotes"
print(c)
d <- 'Double quotes " in between single quote'
print(d)
Arrays

vector1 <- c(5,9,3)


vector2 <- c(10,11,12,13,14,15)
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
Experiment 8
Aim: - Use a command of the form X=matrix (v,2,4)where v is a data vector, to create the
matrix X.

v = c(20, 25, 34, 56, 99, 1006, 2009, 41113) // creating a vector v
v
X = matrix(c(v),nrow=2, ncol=4) // creating a matrix X and assigning the vector value in it.
X
Experiment-9
AIM: Use runif to construct a 5*5 matrix b of random numbers with a uniform.
distribution between 0 and 1.
(a) Extract from it, the second row, second column and the 3*3 matrix of the
values that are not at the margins.
(b)Use sequence to replace the values of the first row of b by 2,5,8,11,14.

X=matrix(runif(20,1,2), nrow=5, ncol=5)


X
X[2,2]
X[2:4,2:4]
X[1,]=seq[2,14,3]
X
Experiment-10
Aim: Write a program to extract data from data frame.

df <- data.frame( c( 183, 85, 40), c( 175, 76, 35), c( 178, 79, 38 ))
names(df) <- c("Height", "Weight", "Age")

//Commands to extract data from data frame


# All Rows and All Columns
df[,]
# First row and all columns
df[1,]
# First two rows and all columns
df[1:2,]
# First and third row and all columns
df[ c(1,3), ]
# First Row and 2nd and third column
df[1, 2:3]
# First, Second Row and Second and Third COlumn
df[1:2, 2:3]
# Just First Column with All rows
df[, 1]
# First and Third Column with All rows
df[,c(1,3)]
Experiment-11
AIM: write a program in to implement various plot histogram, sctterplot, barplot in R.

HISTOGRAM
BMI<-rnorm(n=1000, m=24.2, sd=2.2)
hist(BMI)

BARPLOT
BMI<-rnorm(n=1000, m=24.2, sd=2.2)
barplot(BMI)
Experiment-12
AIM: Write a program to implement function in R.

# Create a sequence of numbers from 32 to 44.


print(seq(39,44))
# Find mean of numbers from 25 to 82.
print(mean(15:90))
# Find sum of numbers frm 41 to 68.
print(sum(41:72))

User defined function


new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
}
new.function(6)
Experiment-13
AIM: Write a program to calculate mean, median & mode.

MEAN
# Create a vector.
x <- c(12,7,5,4.2,18,2)
# Find Mean.
result.mean <- mean(x)
print(result.mean)

MEDIAN
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find the median.
median.result <- median(x)
print(median.result)
MODE
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
# Calculate the mode using the user function.
result <- getmode(v)
print(result)
# Create the vector with characters.
charv <- c("o","it","the","it","it")
# Calculate the mode using the user function.
result <- getmode(charv)
print(result)
Experiment-14
AIM: Write a program in R to work in excel.

Create a .CSV file & enter data for example

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

Now Run:
data <- read.csv("input.csv")
print(data)
in R studio.
Experiment-15
AIM: Write a program in R to fit Lanier regression model.

# Create the predictor and response variable.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()

You might also like