0% found this document useful (0 votes)

106 views14 pages

DAUR Lab Manual

The document discusses linear regression analysis on a dataset containing stopping distances and speeds of cars. A linear model is created to find the best fitting line for stopping distance versus speed. The linear model indicates that for every 1 mph increase in speed, the stopping distance increases by 3.9 feet on average, after accounting for a baseline intercept value. Summary statistics for the linear model show it fits the data well, with R-squared values around 0.65. Plots of the data and linear fit line are also displayed.

Uploaded by

Sunhith Jaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views14 pages

DAUR Lab Manual

Uploaded by

Sunhith Jaina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

VARDHAMAN COLLEGE OF ENGINEERING

(Autonomous)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Shamshabad-501218, Hyderabad

VCE-R19
DATA ANALYTICS USING R

LAB MANUAL
VII Semester
A5519
Prepared by
Mr C Satya Kumar
Mr A Ramesh
Mr GS Prasada Reddy
(Assistant Professor,CSE Dept)

1|Page
Week-1: Introduction to R Lab, Basic Programming.

a).Write a R program to create a sequence of numbers from 20 to 50 and find the mean of
numbers from 20 to 60 and sum of numbers from 51 to 91

R Programming Code :
print("Sequence of numbers from 20 to 50:")
print(seq(20,50))
print("Mean of numbers from 20 to 60:")
print(mean(20:60))
print("Sum of numbers from 51 to 91:")
print(sum(51:91))

Sample Output:
[1] "Sequence of numbers from 20 to 50:"
[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[26] 45 46 47 48 49 50
[1] "Mean of numbers from 20 to 60:"
[1] 40
[1] "Sum of numbers from 51 to 91:"
[1] 2911

b). Write a R program to get the first 10 Fibonacci numbers.

R Programming Code :
Fibonacci <- numeric(10)
Fibonacci[1] <- Fibonacci[2] <- 1
for (i in 3:10) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]
print("First 10 Fibonacci numbers:")
print(Fibonacci)

Sample Output:
[1] "First 10 Fibonacci numbers:"
[1] 1 1 2 3 5 8 13 21 34 55

2|Page
Week-1: Introduction to Loops

a). Write a R program to get all prime numbers up to a given number.

R Programming Code :

prime_numbers <- function(n) {

if (n >= 2) {
x = seq(2, n)
prime_nums = c()
for (i in seq(2, n)) {
if (any(x == i)) {
prime_nums = c(prime_nums, i)
x = c(x[(x %% i) != 0], i)
}
}
return(prime_nums)
}
else
{
stop("Input number should be at least 2.")
}
}
prime_numbers(12)

Sample Output:
[1] 2 3 5 7 11

3|Page
Week-2: Programs using Vectors.

a)Write a R program to create a vector of a specified type and length. Create vector of
numeric, complex, logical and character types of length 6.

R Programming Code :

x = vector("numeric", 5)
print("Numeric Type:")
print(x)
c = vector("complex", 5)
print("Complex Type:")
print(c)
l = vector("logical", 5)
print("Logical Type:")
print(l)
chr = vector("character", 5)
print("Character Type:")
print(chr)

Sample Output:
[1] "Numeric Type:"
[1] 0 0 0 0 0
[1] "Complex Type:"
[1] 0+0i 0+0i 0+0i 0+0i 0+0i
[1] "Logical Type:"
[1] FALSE FALSE FALSE FALSE FALSE
[1] "Character Type:"
[1] "" "" "" "" ""

Week-2: Programs using Matrices.

a)Write a R program to create a matrix taking a given vector of numbers as input and
define the column and row names. Display the matrix.
R Programming Code:
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3", "col4")
M = matrix(c(1:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print("Original Matrix:")
print(M)

Sample Output:
[1] "Original Matrix:"
col1 col2 col3 col4
row1 1 2 3 4
row2 5 6 7 8
row3 9 10 11 12
row4 13 14 15 16

4|Page
Week-2: Programs using Factors.

a)Write a R program to find the levels of factor of a given vector

R Programming Code :

v = c(1, 2, 3, 3, 4, NA, 3, 2, 4, 5, NA, 5)

print("Original vector:")
print(v)
print("Levels of factor of the said vector:")
print(levels(factor(v)))

Sample Output:

[1] "Original vector:"

[1] 1 2 3 3 4 NA 3 2 4 5 NA 5
[1] "Levels of factor of the said vector:"
[1] "1" "2" "3" "4" "5"

Week-2: Programs using List

c).Write a R program to create a list containing strings, numbers, vectors and a logical values.

R Programming Code :

list_data = list("Python", "PHP", c(5, 7, 9, 11), TRUE, 125.17, 75.83)

print("Data of the list:")
print(list_data)

Sample Output:
[1] "Data of the list:"
[[1]]
[1] "Python"

[[2]]
[1] "PHP"

[[3]]
[1] 5 7 9 11

[[4]]
[1] TRUE

[[5]]
[1] 125.17

[[6]]
[1] 75.83

5|Page
Week-3: Programs using Statics.

a) Apply all statistical concepts using R.

R is known to be one of the most popular statistical programming languages. It is vital to

understand the in-built statistical functions in R. This section will outline the most common
statistical calculations that are performed by data scientists.

a.1 Filling Missing Values:

One of the most common tasks in a data science project is to fill the missing values. We can use
the is.na() to find the elements that are empty (NA or NAN):

vec <- c("test", NA, "another test")

is.na(vec)
This will print FALSE TRUE FALSE, indicating that the second element in NA.
To understand them better, is.na() will return all of those elements/objects that are NA. is.nan()
will return all of the NaN objects. It’s important to note that the NaN is an NA but an NA is not a
NaN.
Note: Many statistical functions such as mean, median, etc, take in an argument: na.rm which
indicates whether we want to remove the na (missing values).

The next few calculations will be based on the following two vectors:

A <- c(1,2,5,6.4,6.7,7,7,7,8,9,3,4,1.5,0,10,5.1,2.4,3.4, 4.5, 6.7)

B <- c(4,4.1,0,1.4,2,1,6.7,7,5,5,8,9,3,2,2.5,0,10,5.1,4.3,5.7)print(length(A)) #20
print(length(B)) #20
Both of the vectors, A and B, contain numerical values of 20 elements.

a.2 Mean

Mean is computed by summing the values in a collection and then dividing by the total number
of values:
my_mean <- mean(A)
print(my_mean)

a.3 Median

Median is the middle value in a sorted collection. If there are an even number of values then it’s
the average of the two middle values:

my_median <- median(A)

print(my_median)

a.4 Mode
A mode is the most frequent value. R is missing a standard built-in function to calculate the
mode. However, we can create a function to calculate it as shown below.

distinct_A <- unique(A)

matches <- match(A, distinct_A)
table_A <- tabulate(matches)
max_A <- which.max(table_A)

6|Page
mode<-distinct_A[max_A]
print(mode)

The function performs the following steps:

1. Computes the distinct values of the collection
2. Then it finds the frequency of each of the item and creates a table out of it
3. Lastly, it finds the index of the term that has the highest occurrence and returns it as the
mode.

a.5 Standard deviation

Standard deviation is the deviation of the values from the mean.

sd <- sd(A)
print(sd)

a.6 Variance
Variance is the square of the standard deviation:

var <- var(A)

print(var)

a.7 Correlation
Correlation helps us understand whether the collections have a relationship with each other
and whether they co-move with each other along with the strength of the relationship:

cor <- cor(A, B)

print(cor)
We can pass in a specific correlation method such as Kendall or Spearman. Pearson is the
default correlation method. Pass in the correlation method in the method argument.

a.8 Covariance
Covariance is created to inform us about the relationship between variables.

covariance <- cov(A, B)

print(covariance)

a.9 Standardise and normalise the data set

We are often required to normalise data such as by using min-max normalisation or calculate
the z-score using the standardisation mechanism. Standardising data means having a data set
with mean = 0 and standard deviation = 1. It requires subtracting the mean from each of the
observation and then dividing it by the standard deviation. We can use the scale function. If we
want to subtract the mean from each of the observation then set its center parameter to True. If
we want to standardise data then we need to set its scale parameter to True.
normal_A <- scale(A, center=TRUE, scale = FALSE)
print(normal_A)standard_A <- scale(A, center=TRUE, scale = TRUE)
print(standard_A)

7|Page
Week-4: Programs using Linear Regression

Analysis: Dataset cars includes 50 observations with 2 variables - dist containing stoppping
distance in feet and speed containing speed of a car before applying the brakes in miles per
hour. Data were recorded in the 1920s.
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
plot(cars$speed, cars$dist, xlab='Speed (mph)', ylab='Stopping Distance (ft)',
main='Stopping Distance vs. Speed')
Let us build a linear model and find the best fitting line.
cars_lm <- lm(cars$dist ~ cars$speed)
cars_lm
## Call:
## lm(formula = cars$dist ~ cars$speed)
## Coefficients:
## (Intercept) cars$speed
## -17.579 3.932
plot(cars$speed, cars$dist, xlab='Speed (mph)', ylab='Stopping Distance (ft)',
main='Stopping Distance vs. Speed')
abline(cars_lm)
summary(cars_lm)
## Call:
## lm(formula = cars$dist ~ cars$speed)
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## cars$speed 3.9324 0.4155 9.464 1.49e-12 ***
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
plot(cars_lm$fitted.values, cars_lm$residuals, xlab='Fitted Values', ylab='Residuals')
abline(0,0)
qqnorm(cars_lm$residuals)
qqline(cars_lm$residuals)
speed <- cars$speed
speed2 <- speed^2
dist <- cars$dist
cars_qm <- lm(dist ~ speed + speed2)
summary(cars_qm)
## Call:
## lm(formula = dist ~ speed + speed2)
## Residuals:

8|Page
## Min 1Q Median 3Q Max
## -28.720 -9.184 -3.188 4.628 45.152
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.47014 14.81716 0.167 0.868
## speed 0.91329 2.03422 0.449 0.656
## speed2 0.09996 0.06597 1.515 0.136
## Residual standard error: 15.18 on 47 degrees of freedom
## Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532
## F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12
speedvalues <- seq(0, 25, 0.1)
predictedcounts <- predict(cars_qm,list(speed=speedvalues, speed2=speedvalues^2))
plot(speed, dist, pch=16, xlab='Speed (mph)', ylab='Stopping Distance (ft)')
lines(speedvalues, predictedcounts)
plot(cars_qm$fitted.values, cars_qm$residuals, xlab='Fitted Values', ylab='Residuals')
abline(0,0)
qqnorm(cars_qm$residuals)
qqline(cars_qm$residuals)

Week-5: Programs using Logistic Regression

The Dataset: mtcars(motor trend car road test) comprises fuel consumption, performance and 10
aspects of automobile design for 32 automobiles. It comes pre installed with dplyr package in R.

# Installing the package

9|Page
install.packages("dplyr")

# Loading package
library(dplyr)

# Summary of dataset in package

summary(mtcars)

Performing Logistic regression on dataset

Logistic regression is implemented in R using glm() by training the model using features or
variables in the dataset.

# Installing the package

install.packages("caTools") # For Logistic regression
install.packages("ROCR") # For ROC curve to evaluate model

# Loading package
library(caTools)
library(ROCR)

# Splitting dataset
split <- sample.split(mtcars, SplitRatio = 0.8)
split

train_reg <- subset(mtcars, split == "TRUE")

test_reg <- subset(mtcars, split == "FALSE")

# Training model
logistic_model <- glm(vs ~ wt + disp,
data = train_reg,
family = "binomial")
logistic_model

# Summary
summary(logistic_model)

# Predict test data based on model

predict_reg <- predict(logistic_model,
test_reg, type = "response")
predict_reg

# Changing probabilities
predict_reg <- ifelse(predict_reg >0.5, 1, 0)

# Evaluating model accuracy

# using confusion matrix
table(test_reg$vs, predict_reg)

missing_classerr <- mean(predict_reg != test_reg$vs)

print(paste('Accuracy =', 1 - missing_classerr))

# ROC-AUC Curve
ROCPred <- prediction(predict_reg, test_reg$vs)

10 | P a g e
ROCPer <- performance(ROCPred, measure = "tpr",
x.measure = "fpr")

auc <- performance(ROCPred, measure = "auc")

auc <- [email protected][[1]]
auc

# Plotting curve
plot(ROCPer)
plot(ROCPer, colorize = TRUE,
print.cutoffs.at = seq(0.1, by = 0.1),
main = "ROC CURVE")
abline(a = 0, b = 1)

auc <- round(auc, 4)

legend(.6, .4, auc, title = "AUC", cex = 1)

wt influences dependent variables positively and one unit increase in wt increases the log of
odds for vs =1 by 1.44. disp influences dependent variables negatively and one unit increase in
disp decreases the log of odds for vs =1 by 0.0344. Null deviance is 31.755(fit dependent
variable with intercept) and Residual deviance is 14.457(fit dependent variable with all
independent variable). AIC(Alkaline Information criteria) value is 20.457 i.e the lesser the
better for the model. Accuracy comes out to be 0.75 i.e 75%.

Week 6: Implement Decision trees

#read data file
mydata= read.csv("C:\\Users\\german_credit.csv")
# Check attributes of data
str(mydata)

11 | P a g e
'data.frame': 1000 obs. of 21 variables:
$ Creditability : Factor w/ 2 levels "0","1": 2 2 2 2
$ Account.Balance : int 1 1 2 1 1 1 1 1 4 2 ...
$ Duration.of.Credit..month. : int 18 9 12 12 12 10 8 6 18 24 ...
$ Payment.Status.of.Previous.Credit: int 4 4 2 4 4 4 4 4 4 2 ...
$ Purpose : int 2 0 9 0 0 0 0 0 3 3 ...
$ Credit.Amount : int 1049 2799 841 2122 2171 2241
$ Value.Savings.Stocks : int 1 1 2 1 1 1 1 1 1 3 ...
$ Length.of.current.employment : int 2 3 4 3 3 2 4 2 1 1 ...
$ Instalment.per.cent : int 4 2 2 3 4 1 1 2 4 1 ...
$ Sex...Marital.Status : int 2 3 2 3 3 3 3 3 2 2 ...
$ Guarantors : int 1 1 1 1 1 1 1 1 1 1 ...
$ Duration.in.Current.address : int 4 2 4 2 4 3 4 4 4 4 ...
$ Most.valuable.available.asset : int 2 1 1 1 2 1 1 1 3 4 ...
$ Age..years. : int 21 36 23 39 38 48 39 40 65 23 ...
$ Concurrent.Credits : int 3 3 3 3 1 3 3 3 3 3 ...
$ Type.of.apartment : int 1 1 1 1 2 1 2 2 2 1 ...
$ No.of.Credits.at.this.Bank : int 1 2 1 2 2 2 2 1 2 1 ...
$ Occupation : int 3 3 2 2 2 2 2 2 1 1 ...
$ No.of.dependents : int 1 2 1 2 1 2 1 2 1 1 ...
$ Telephone : int 1 1 1 1 1 1 1 1 1 1 ...
$ Foreign.Worker : int 1 1 1 2 2 2 2 2 1 1 ...
# Check number of rows and columns
dim(mydata)
# Make dependent variable as a factor (categorical)
mydata$Creditability = as.factor(mydata$Creditability)
# Split data into training (70%) and validation (30%)
dt = sort(sample(nrow(mydata), nrow(mydata)*.7))
train<-mydata[dt,]
val<-mydata[-dt,] # Check number of rows in training data set
nrow(train)
# To view dataset
edit(train)
# Decision Tree Model
library(rpart)
mtree <- rpart(Creditability~., data = train, method="class", control = rpart.control(minsplit =
20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 ))

mtree

#Plot tree
plot(mtree)
text(mtree)

#Beautify tree
library(rattle)
library(rpart.plot)
library(RColorBrewer)

#view1
prp(mtree, faclen = 0, cex = 0.8, extra = 1)

#view2 - total count at each node

tot_count <- function(x, labs, digits, varlen)

12 | P a g e
{paste(labs, "\n\nn =", x$frame$n)}

prp(mtree, faclen = 0, cex = 0.8, node.fun=tot_count)

#view3- fancy Plot

rattle()
fancyRpartPlot(mtree)

############################
########Pruning#############
############################

printcp(mtree)
bestcp <- mtree$cptable[which.min(mtree$cptable[,"xerror"]),"CP"]

# Prune the tree using the best cp.

pruned <- prune(mtree, cp = bestcp)

# Plot pruned tree

prp(pruned, faclen = 0, cex = 0.8, extra = 1)

# confusion matrix (training data)

conf.matrix <- table(train$Creditability, predict(pruned,type="class"))
rownames(conf.matrix) <- paste("Actual", rownames(conf.matrix), sep = ":")
colnames(conf.matrix) <- paste("Pred", colnames(conf.matrix), sep = ":")
print(conf.matrix)

#Scoring
library(ROCR)
val1 = predict(pruned, val, type = "prob")
#Storing Model Performance Scores
pred_val <-prediction(val1[,2],val$Creditability)

# Calculating Area under Curve

perf_val <- performance(pred_val,"auc")
perf_val

# Plotting Lift curve

plot(performance(pred_val, measure="lift", x.measure="rpp"), colorize=TRUE)

# Calculating True Positive and False Positive Rate

perf_val <- performance(pred_val, "tpr", "fpr")

# Plot the ROC curve

plot(perf_val, col = "green", lwd = 1.5)

#Calculating KS statistics
ks1.tree <- max(attr(perf_val, "y.values")[[1]] - (attr(perf_val, "x.values")[[1]]))
ks1.tree
# Advanced Plot
prp(pruned, main="Decision Tree",
extra=106,
nn=TRUE,
fallen.leaves=TRUE,

13 | P a g e
branch=.5,
faclen=0,
trace=1,
shadow.col="gray",
branch.lty=3,
split.cex=1.2,
split.prefix="is ",
split.suffix="?",
split.box.col="lightgray",
split.border.col="darkgray",
split.round=.5)

Decision Tree

14 | P a g e

R-Programming Notes
100% (1)
R-Programming Notes
33 pages
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
No ratings yet
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
58 pages
Statistics With R Programming For Bigdata (Autosaved)
No ratings yet
Statistics With R Programming For Bigdata (Autosaved)
41 pages
R Lab Manual
No ratings yet
R Lab Manual
19 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
35 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
Glocal University: Practical File of R Programming
100% (1)
Glocal University: Practical File of R Programming
32 pages
RP LabManual - R19
No ratings yet
RP LabManual - R19
50 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
R Unit1
No ratings yet
R Unit1
26 pages
R Programs I Mba II Sem
No ratings yet
R Programs I Mba II Sem
11 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
Package Vegan': R Topics Documented
No ratings yet
Package Vegan': R Topics Documented
296 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
Technical Skills 1 1
No ratings yet
Technical Skills 1 1
22 pages
R - Programming - Moduel 1 - Module 4
No ratings yet
R - Programming - Moduel 1 - Module 4
88 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
DA Practical File
No ratings yet
DA Practical File
36 pages
R - Lab Experiments - Manual
No ratings yet
R - Lab Experiments - Manual
39 pages
R Programming Variables Datatypes Vectors Lists Notes
No ratings yet
R Programming Variables Datatypes Vectors Lists Notes
18 pages
Basic Coding: R-Programming
No ratings yet
Basic Coding: R-Programming
35 pages
R Programming LAB Manual
No ratings yet
R Programming LAB Manual
39 pages
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
No ratings yet
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
8 pages
R Tools LAB
No ratings yet
R Tools LAB
31 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
It Workshop Lab File
No ratings yet
It Workshop Lab File
39 pages
Handout 02
No ratings yet
Handout 02
12 pages
R Lab Assignment
No ratings yet
R Lab Assignment
17 pages
R Programs
No ratings yet
R Programs
12 pages
R Lab Record
No ratings yet
R Lab Record
30 pages
Big Data File in R
No ratings yet
Big Data File in R
23 pages
An Introduction To R: Biostatistics 615/815
No ratings yet
An Introduction To R: Biostatistics 615/815
59 pages
R Lab
No ratings yet
R Lab
114 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
R Programming Lab
No ratings yet
R Programming Lab
14 pages
Unit 2 Notes - Data Analysis Using R
No ratings yet
Unit 2 Notes - Data Analysis Using R
19 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Arunav Da Prac
No ratings yet
Arunav Da Prac
55 pages
EMERGENCY
No ratings yet
EMERGENCY
19 pages
CS605 Da
No ratings yet
CS605 Da
21 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Lab Manual R - STD
No ratings yet
Lab Manual R - STD
17 pages
Dinesh Dodeja DS 2
No ratings yet
Dinesh Dodeja DS 2
18 pages
R Programming
No ratings yet
R Programming
23 pages
R Lab
No ratings yet
R Lab
15 pages
Introduction To R Chap 2
No ratings yet
Introduction To R Chap 2
30 pages
R Assignment
No ratings yet
R Assignment
9 pages
R Programming Lab
No ratings yet
R Programming Lab
33 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Nishant R File
No ratings yet
Nishant R File
49 pages
Introduction To Analytics and R File
No ratings yet
Introduction To Analytics and R File
29 pages
R Programming Solved
No ratings yet
R Programming Solved
9 pages
R-1ST Internal-Lab Notes
No ratings yet
R-1ST Internal-Lab Notes
14 pages
R Programming Lab 2
No ratings yet
R Programming Lab 2
10 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
Data Types in R (Vectors)
No ratings yet
Data Types in R (Vectors)
48 pages
Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
No ratings yet
Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
51 pages
1 Pengantar Keandalan
100% (1)
1 Pengantar Keandalan
144 pages
Pearson Correlation: 4 Quarter Statistics and Probability
No ratings yet
Pearson Correlation: 4 Quarter Statistics and Probability
15 pages
Chapter 11 - ANOVA
No ratings yet
Chapter 11 - ANOVA
57 pages
K L Transform
No ratings yet
K L Transform
4 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
C4 English
No ratings yet
C4 English
27 pages
Nonlife Actuarial Models: Classical Credibility
No ratings yet
Nonlife Actuarial Models: Classical Credibility
28 pages
STA1203 Final AA-Apr2022-Question Paper
No ratings yet
STA1203 Final AA-Apr2022-Question Paper
6 pages
House Price Prediction: Group Name: Bug Free
No ratings yet
House Price Prediction: Group Name: Bug Free
32 pages
113 1 3 Ch06 Variables Control Charts - 2in1
No ratings yet
113 1 3 Ch06 Variables Control Charts - 2in1
43 pages
Lecture 5. Discrete Probability
No ratings yet
Lecture 5. Discrete Probability
25 pages
Thesis Final Version Julian Van Erk
No ratings yet
Thesis Final Version Julian Van Erk
30 pages
Statistics Module 6 WW
No ratings yet
Statistics Module 6 WW
1 page
Structural Equation Models With Latent V
No ratings yet
Structural Equation Models With Latent V
36 pages
The Chi-Squared Test With TI-Nspire IB10
No ratings yet
The Chi-Squared Test With TI-Nspire IB10
5 pages
Stats 2
No ratings yet
Stats 2
9 pages
Assignment of Econometrics
No ratings yet
Assignment of Econometrics
10 pages
Kategorisasi Variabel: Dep - Tal
No ratings yet
Kategorisasi Variabel: Dep - Tal
5 pages
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
No ratings yet
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
2 pages
The Pajama Survey
No ratings yet
The Pajama Survey
10 pages
UFM Mid-Sem QBM101 Revision - 2024 Sept
No ratings yet
UFM Mid-Sem QBM101 Revision - 2024 Sept
11 pages
Sample MCQ Stats
No ratings yet
Sample MCQ Stats
6 pages
Stat
No ratings yet
Stat
4 pages
BBT 3106 - Probability & Statistics II - August 2023EC
No ratings yet
BBT 3106 - Probability & Statistics II - August 2023EC
3 pages
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
No ratings yet
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
2 pages
Practical 4
No ratings yet
Practical 4
2 pages
Sym Case Study
No ratings yet
Sym Case Study
1 page
C Programs To Become Expert In Programming
From Everand
C Programs To Become Expert In Programming
Shubham Yadav
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet

DAUR Lab Manual

Uploaded by

DAUR Lab Manual

Uploaded by

VARDHAMAN COLLEGE OF ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

b). Write a R program to get the first 10 Fibonacci numbers.

a). Write a R program to get all prime numbers up to a given number.

prime_numbers <- function(n) {

Week-2: Programs using Matrices.

a)Write a R program to find the levels of factor of a given vector

v = c(1, 2, 3, 3, 4, NA, 3, 2, 4, 5, NA, 5)

[1] "Original vector:"

Week-2: Programs using List

list_data = list("Python", "PHP", c(5, 7, 9, 11), TRUE, 125.17, 75.83)

a) Apply all statistical concepts using R.

R is known to be one of the most popular statistical programming languages. It is vital to

a.1 Filling Missing Values:

vec <- c("test", NA, "another test")

A <- c(1,2,5,6.4,6.7,7,7,7,8,9,3,4,1.5,0,10,5.1,2.4,3.4, 4.5, 6.7)

my_median <- median(A)

distinct_A <- unique(A)

The function performs the following steps:

a.5 Standard deviation

var <- var(A)

cor <- cor(A, B)

covariance <- cov(A, B)

a.9 Standardise and normalise the data set

Week-5: Programs using Logistic Regression

# Installing the package

# Summary of dataset in package

Performing Logistic regression on dataset

# Installing the package

train_reg <- subset(mtcars, split == "TRUE")

# Predict test data based on model

# Evaluating model accuracy

missing_classerr <- mean(predict_reg != test_reg$vs)

auc <- performance(ROCPred, measure = "auc")

auc <- round(auc, 4)

Week 6: Implement Decision trees

#view2 - total count at each node

prp(mtree, faclen = 0, cex = 0.8, node.fun=tot_count)

#view3- fancy Plot

# Prune the tree using the best cp.

# Plot pruned tree

# confusion matrix (training data)

# Calculating Area under Curve

# Plotting Lift curve

# Calculating True Positive and False Positive Rate

# Plot the ROC curve

You might also like