0% found this document useful (0 votes)

8 views2 pages

Ds

The document provides a comprehensive guide on data management operations in R using the 'dplyr' package, including data manipulation techniques such as filtering, selecting, and summarizing data. It also covers practical applications of various statistical methods, including logistic regression, decision trees, hypothesis testing, time-series forecasting, and principal component analysis. Additionally, the document discusses clustering techniques and includes examples with datasets like 'students.csv', 'biopsy', and 'iris'.

Uploaded by

sefami1889

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views2 pages

Ds

Uploaded by

sefami1889

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Management Operations in R using ‘dplyr’ str(data4)

install.packages("dplyr") head(data4)
"Data Manipulation in R" dim(data4)
library(dplyr) names(data4)
student<-read.csv("students.csv",sep = ",",header = T) modeldata<-lm(index~written+language+tech+gk,data=data4)
View(student) …str(student) …dim(student) summary(modeldata)
filter(student,Class=="TYCS") data4$pred<-predict(modeldata,data4)
filter(student,Marks>500) head(data4)
filter(student,Class=="TYCS",Marks>500) modeldata$residuals
filter(student,Class=="TYCS" & Marks>500) Aim : Practical of Logistic Regression Algorithm
filter(student,Class=="TYCS" | Marks>500) library("MASS") …data("biopsy") …View(biopsy) …str(biopsy) …
filter(student,Class=="TYCS" | Marks<=500) names(biopsy) …
select(student,Name,Class) summary(biopsy) ….colSums((is.na(biopsy)))
data1<-filter(student,Class=="TYCS" | Marks<=500) biopsy1=na.omit(biopsy) …
select(data1,Name,Class,Marks) colSums((is.na(biopsy1))) …
arrange(student,Name) biopsy$ID=NULL …boxplot(biopsy)
arrange(student,Marks) fit<-glm(class~.,family = binomial,data = biopsy1)
arrange(student,desc(Marks)) summary(fit)
data2<-arrange(student,desc(Marks)) biopsy1$prob<-predict(fit,type = "response")
filter(data2,Class=="TYCS") View(biopsy1)
mutate(student,Perc=Marks/10) biopsy1$predict=rep("benign",683)
summarize(student,n()) biopsy1$predict[biopsy1$prob>0.99]="malignant"
summarise(student,max(Marks)) View(biopsy1)
summarise(student,IQR(Marks)) table(biopsy1$predict,biopsy1$class)
summarise(student,mean(Marks)) mean(biopsy1$predict==biopsy1$class)
summarise(student,sum(Marks)) fit2<-glm(class~V1+V4+V6+V7,family = binomial,data=biopsy1)
summarise(student,sd(Marks)) summary(fit2)
grp=group_by(student,Class) biopsy1$prob=predict(fit2,type = "response")
summarise(grp,mean(Marks)) View(biopsy1)
summarise(grp,min(Marks)) biopsy1$predict=rep("benign",683)
student%>%filter(Class=="TYCS")%>%select(Name) biopsy1$predict[biopsy1$prob>0.5]="malignant"
student%>%filter(Class=="TYCS")%>%arrange(Marks) View(biopsy1)
student%>%filter(Class=="TYCS")%>%arrange(desc(Marks)) table(biopsy1$predict,biopsy1$class)
student%>%filter(Class=="TYCS")%>%summarise(n()) mean(biopsy1$predict==biopsy1$class)
count(student,Class)
summarise(grp,n()) Practical of Decision Tree
student%>%group_by(Class)%>%summarise(n()) Regression tree
// Plots data<-read.csv("Hitters.csv",sep = ",",header = T)
hist(student$Marks,xlab="Student Marks",main = "Histogram of View(data)...str(data) …summary(data)... names(data) …library(rpart)
Student Marks") regtree<-rpart(Salary~Hits+Runs+Years,data=data)
barplot(student$Marks,xlab="Student Marks",main = "Barplot of regtree …plot(regtree) …plot(regtree)... text(regtree)
Student Marks") install.packages("rpart.plot") library(rpart.plot) …rpart.plot(regtree) …
boxplot(student$Marks) View(regtree) …
data<-read.csv(file.choose(),sep = ",",header = T) //Cp-complexity parameter
plot(data$Year,data$Rainfall,type="l",col="red",lwd=3) regtree$cptable …cp=min(regtree$cptable[5,]) …
data<-read.csv(file.choose(),sep = ",",header = T) pr=prune(regtree,cp=cp) …rpart.plot(pr) …
plot(data$Year,data$Rainfall,type="l",col="red",lwd=3) //Classification Tree
plot(data$Year,data$Population,type="l",lty="dotted",col="blue",lwd=3 library("MASS") …data("biopsy") …View(biopsy) …str(biopsy) …
) …….data("mtcars") names(biopsy) …summary(biopsy) …biopsy$ID=NULL
install.packages("corrplot") classtree<-rpart(class~.,data=biopsy)
library(corrplot) rpart.plot(classtree)
M<-cor(mtcars) biopsy$pred=predict(classtree,biopsy,type = "class")
corrplot(M,method = "ellipse") table(biopsy$pred,biopsy$class)
corrplot(M,method = "ellipse",col = "red") install.packages("titanic")
data("iris") library("titanic")
plot(iris$Petal.Length,iris$Petal.Width,col=iris$Species,pch=15) data("titanic_train")
str(titanic_train)
Practical of Simple/Multiple Linear Regression View(titanic_train)
#simple linear regression titanic_train$Name=NULL
data3<-read.csv("studweight.csv",sep = ",",header = T) titanictree<-rpart(Survived~Pclass+Age+Parch,data = titanic_train)
summary(data3) rpart.plot(titanictree)
str(data3)
fit<-lm(Weight~Height,data=data3) 'Classification tree'
summary(fit) golf<-read.csv("Golf.csv",sep = ',',header = T)
"Height is very significant in determining the weight" View(golf)...str(golf) …names(golf) …library("rpart") …
plot(data3$Height,data3$Weight) install.packages("rpart.plot") …library("rpart.plot") …
abline(fit,lwd=3,col="blue") tree<-rpart(Play~.,data=golf,control = rpart.control(minsplit =
#multiple linear regression 1,minbucket = 1,cp=0)) …rpart.plot(tree)
data4<-read.csv("emp_index.csv",sep = ",",header = T)
summary(data4)
Practical of Hypothesis Testing #one sample t-test dim(newdata)
data<-read.csv("onesample.csv",sep = ",",header = T) # Hierarchical clustering on IRIS dataset
View(data) …str(data) …summary(data) …boxplot(data) # dist function is used to compute the distance matrix
t.test(data$Time,mu=80,alternative="greater") # i.e. Euclidean distance between every pair of observations
#two sample t-test clust<-hclust(dist(iris[,3:4]))
my_data<-read.csv("twosample.csv",sep = ",",header = T) plot(clust)
View(my_data) …str(my_data) … clusterCut<-cutree(clust,3)
summary(my_data) …boxplot(my_data) … table(clusterCut,iris$Species)
var.test(my_data$time_g1,my_data$time_g2,alternative="two.sided") clust<-hclust(dist(iris[,3:4]),method = "average")
t.test(my_data$time_g1,my_data$time_g2,alternative="two.sided") plot(clust)
#paired t-test clusterCut<-cutree(clust,3)
time<-read.csv("paired_t_test.csv",sep = ",",header = T) table(clusterCut,iris$Species)
View(time) …str(time) …
summary(time) …boxplot(time) Aim : Practical of Time-Series Forecasting
t.test(time$time_before,time$time_after,alternative="greater",paired = # Time Series Analysis and Forecasting on AirPassengers
T) install.packages("forecast")
#correlation library(forecast)
cor<-read.csv("correlation.csv",sep = ",",header = T) data("AirPassengers")
View(cor) class(AirPassengers)
str(cor) head(AirPassengers)
summary(cor) sum(is.na(AirPassengers))
cor.test(cor$aptitude,cor$job_prof,alternative = "two.sided",method = summary(AirPassengers)
"pearson") plot(AirPassengers)
#paired t-test application tsdata<-ts(AirPassengers,frequency = 12)
stud<-read.csv("student.csv",sep = ",",header = T) ddata<-decompose(tsdata)
View(stud) plot(ddata)
str(stud) holt<-HoltWinters(tsdata,beta = FALSE,gamma = FALSE)
summary(stud) plot(holt)
boxplot(stud) # Time Series Analysis on Rainfall dataset
t.test(stud$Test1,stud$Test2,alternative="less",paired = T) rainfall<-read.csv("rainfall.csv",sep = ",",header = T)
#correlation - Ice cream head(rainfall)
ice<-read.csv("icecream.csv",sep = ",",header = T) summary(rainfall)
View(ice) class(rainfall)
str(ice) tsdata<-ts(rainfall,frequency = 12,start = c(2012,1))
summary(ice) class(tsdata)
boxplot(ice) plot(tsdata)
cor.test(ice$Total.sales,ice$Temp,alternative = "two.sided",method =
"pearson") Aim : Practical of Principal Component Analysis.
# Principal Component Analysis upon IRIS dataset
Aim : Practical of Analysis of Variance data("iris")
#one-way-anova test str(iris)
data1<-read.csv("one-way-anova.csv",sep = ",", header = T) summary(iris)
names(data1) …str(data1) mypr<-prcomp(iris[,-5])
data1$dept<-as.factor(data1$dept) mypr
str(data1) …summary(data1) …View(data1) …head(data1) summary(mypr)
anv1<-aov(formula=satindex~dept,data=data1)...summary(anv1) plot(mypr,type="l")
biplot(mypr)
#two-way-anova test
data2<-read.csv("crop-data.csv",sep = ",",header = T) db.student.insert({_id=101,RollNo:4,Name=”Laxmi”,Marks:450,H
names(data2) obbies:[“Reading”,”Danci ng”]});
str(data2)
data2$density<-as.factor(data2$density) db.student.find({Class:”TYCS”},{Name:1,Class:1,_id=0})
str(data2) db.student.find({Class:{$ne:”TYCS”}},{Name:1,Class:1,_id:0})
summary(data2) db.student.find().sort({Marks:1}) //ascending
head(data2) db.student.find({Class:”TYCS”},Marks:{$gt:400}})
View(data2) //or, and, not
anv2<-aov(formula=yield~density+block+fertilizer,data=data2) db.student.find({$or:[{Class:”TYCS”},Marks:{$gt:500}}]})
summary(anv2) db.student.find({Class:{$ne:”TYCS”}},{Name:1,Class:1,_id:0}) ->
will name and class of those students whose class not TYCS
library(readxl) db.student.update({RollNo:2},{$set:{Marks:531}})
mydata<-read.csv("newsadv.csv") …View(mydata) …names(mydata) db.student.remove({Class:”FYCS”})
anv<-aov(formula=Count~Day+Section,data=mydata) db.student.updateMany({Class:”TYCS”},{$inc:{Marks:5}})
summary(anv) db.Employee.aggregate({$group:{“_id”:”$Dept”,”Count”:{$sum:
Practical of Clustering 1}}}) -> This will retrieve the number of employees in each
# K-means clustering on IRIS dataset department
data("iris")...names(iris)...newdata<-iris[,-5]...head(newdata) …………..:”$Dept”,”Count”:{$avg:”$Salary”}}})
fit<-kmeans(newdata,3)
library(cluster) db.student.find({},{Name:1,Marks:1,_is:0}).sort({Marks:1}) ->
clusplot(newdata,fit$cluster,color=T,shade=T,labels=2,lines=0) sort the name and marks using projection argument
fit… fit$size

Brc300 & Brc400 Manual
100% (3)
Brc300 & Brc400 Manual
120 pages
CDJ-900NXS Service Manual
No ratings yet
CDJ-900NXS Service Manual
143 pages
Using QuickC PDF
100% (1)
Using QuickC PDF
630 pages
Chap-3 Search Algorithms in Artificial Intelligence
100% (1)
Chap-3 Search Algorithms in Artificial Intelligence
93 pages
전자회로 3장 솔루션
No ratings yet
전자회로 3장 솔루션
50 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
2.1 Elasticity of Demand
No ratings yet
2.1 Elasticity of Demand
32 pages
Release Notes ONT R24.02
No ratings yet
Release Notes ONT R24.02
88 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
Da Lab File
No ratings yet
Da Lab File
33 pages
E1000-Series ServiceMaintenance MA00758A
No ratings yet
E1000-Series ServiceMaintenance MA00758A
65 pages
R Program
No ratings yet
R Program
22 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Less Known Solaris Features
No ratings yet
Less Known Solaris Features
394 pages
The Design and Evolution of The UberBake Light Baking System FINAL
No ratings yet
The Design and Evolution of The UberBake Light Baking System FINAL
12 pages
Data Science
No ratings yet
Data Science
15 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
PART-A (10 2 20marks) 1. What Are The Various Functions of Web Server?
No ratings yet
PART-A (10 2 20marks) 1. What Are The Various Functions of Web Server?
17 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
R Practicals
No ratings yet
R Practicals
32 pages
Cics Presentation
No ratings yet
Cics Presentation
19 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
50 Sozin Chess Miniatures
100% (1)
50 Sozin Chess Miniatures
17 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Unit - 3 Learning Notes
No ratings yet
Unit - 3 Learning Notes
8 pages
AAA Presentation-FInal
No ratings yet
AAA Presentation-FInal
22 pages
5G Wireless Technology Seminar (2002028) - 2
No ratings yet
5G Wireless Technology Seminar (2002028) - 2
20 pages
R Code
No ratings yet
R Code
9 pages
Iar User Guide
No ratings yet
Iar User Guide
34 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
R File Code
No ratings yet
R File Code
16 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
WEEK
No ratings yet
WEEK
17 pages
Draft 001 001
No ratings yet
Draft 001 001
31 pages
Toc ch1
No ratings yet
Toc ch1
9 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
DSA Roadmap - C&D
No ratings yet
DSA Roadmap - C&D
9 pages
R Training AM
No ratings yet
R Training AM
6 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Commands
No ratings yet
R Commands
18 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
R Codes
No ratings yet
R Codes
5 pages
R Examples
No ratings yet
R Examples
56 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Wjec Ict Coursework Examples
100% (2)
Wjec Ict Coursework Examples
7 pages
Model Lab
No ratings yet
Model Lab
6 pages
Useful R Functions-1
No ratings yet
Useful R Functions-1
4 pages
CS 701 Viva Qa
No ratings yet
CS 701 Viva Qa
4 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
TIME TABLE - M.E/M.Tech. - APRIL/MAY-2010 Regulations: 2005
No ratings yet
TIME TABLE - M.E/M.Tech. - APRIL/MAY-2010 Regulations: 2005
12 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Console
No ratings yet
R Console
6 pages
ViciBox v7 Install
No ratings yet
ViciBox v7 Install
16 pages
Partner Program Guide: Dell Technologies 2020
No ratings yet
Partner Program Guide: Dell Technologies 2020
18 pages
More Information About Balloon Shooting Game - Presentation
50% (4)
More Information About Balloon Shooting Game - Presentation
17 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
CMR Institute of Technology: Local Multipoint Distribution System
No ratings yet
CMR Institute of Technology: Local Multipoint Distribution System
7 pages
R Functions
No ratings yet
R Functions
8 pages
R Course
No ratings yet
R Course
7 pages
Syntax For R Stats: Appraisal - Data (Name of Data Sheet) Descriptive
No ratings yet
Syntax For R Stats: Appraisal - Data (Name of Data Sheet) Descriptive
4 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Ds
No ratings yet
Ds
2 pages
Indra Product - GIS
No ratings yet
Indra Product - GIS
6 pages
Warehouse Management System: Oracle WMS
No ratings yet
Warehouse Management System: Oracle WMS
30 pages
GlobaleMedia - Advertiser - Indian-IO-Spartan Poker - Final Approved Signed
No ratings yet
GlobaleMedia - Advertiser - Indian-IO-Spartan Poker - Final Approved Signed
2 pages
Presentation Tweet - 140 Ways To Present With Impact
No ratings yet
Presentation Tweet - 140 Ways To Present With Impact
22 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
BAN5
No ratings yet
BAN5
2 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
Lab 3.6.4 Connect and Configure Hosts
No ratings yet
Lab 3.6.4 Connect and Configure Hosts
4 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet

Ds

Uploaded by

Ds

Uploaded by

Data Management Operations in R using ‘dplyr’ str(data4)

You might also like