MBA Sem 1 Unit 3 Fundamentals of R (1)
MBA Sem 1 Unit 3 Fundamentals of R (1)
FUNDAMENTALS OF R
Installing R
2
🞑 setwd("/web/com")
> getwd()
1 "C:/Users/sheena/Documents"
> setwd("C:\Users\sheena\Desktop\NOTES")
> getwd()
[1] "C:/Users/sheena/Desktop/NOTES"
CSV File
7
or
🞑 View(data)
CSV File
9
🞑 print(ncol(data))
🞑 print(nrow(data))
🞑 print(sal)
Get the details of the person with max
11
salary
# Create a data frame.
🞑 data <- read.csv("input.csv")
🞑 print(retval)
Get all the people working in IT
12
department
# Create a data frame.
data <- read.csv("input.csv")
print(retval)
Get the persons in IT department
13
whose salary is greater than 600
# Create a data frame.
data <- read.csv("input.csv")
print(info)
Get the people who joined on or after
14
2014
# Create a data frame.
data <- read.csv("input.csv")
print(newdata)
17 Excel File
Install Package
18
install.packages("xlsx")
library("xlsx")
Or
install.packages(“openxlsx”)
library(openxlsx)
Reading the Excel File
19
db = file.choose()
dataset = read.spss(db, to.data.frame=TRUE)
22 SAS files
Install.packages(“foreign”)
library(foreign)
or
Install.packages(“haven”)
library(haven)
Reading from SAS file
23
print(data.fame)
Query with Filter Clause
27
print(data)
Updating Rows in the Tables
28
dbSendQuery
(mysqlconnection, "insert into mtcars(mpg, cyl, disp,
hp, drat, wt, qsec, vs, am, gear, carb) values
('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9,
2.875, 17.02, 0, 1, 4, 4)" )
30 RDBMS usind ODBC
install.packages(“RODBC")
ODBC Setting in Windows
31
🞑 Database = “mydatabase",
🞑 Trusted_Connection = "True”
Create DSN
Reading from SQL Server
32
using ODBC
#Establishing ODBC Connection
con <- odbcConnect("DSN name")
sqlTables(conn)
Or
Select
🞑 Select(d1,country)
🞑 Select(d1, -order.date)
🞑 Select(d1,order.id:aging,states)
🞑 Select(d1,starts_with(“c”))
🞑 Select(d1,ends_with(“c”))
🞑 Select(d1,contains(“c”))
distinct
DPLYR Functions
35
Filter
The filter function will return all the rows that satisfy
a following condition.
🞑 filter(airquality, Temp > 70)
🞑 filter(airquality, Temp > 80 & Month > 5)
Mutate
Mutate is used to add new variables to the data.
🞑 mutate(airquality, TempInC = (Temp - 32) * 5 / 9)
DPLYR Functions
36
Summarise
The summarise function is used to summarise multiple
values into a single value.
🞑 summarise(airquality, mean(Temp, na.rm = TRUE))
na.rm = TRUE will remove all NA values
Group By
The group_by function is used to group data by one
or more variables.
🞑 summarise(group_by(airquality, Month), mean(Temp,
na.rm = TRUE))
DPLYR Functions
37
Count
The count function tallies observations based on a group. It is
slightly similar to the table function in the base package. For
example:
🞑 count(airquality, Month)
Arrange
The arrange function is used to arrange rows by variables.
Currently, the airquality dataset is arranged based on
Month, and then Day. We can use the arrange function to
arrange the rows in the descending order of Month, and
then in the ascending order of Day.
🞑 arrange(airquality, desc(Month), Day)
DPLYR Functions
38
Sample
Thesample function is used to select random rows
from a table. The first line of code randomly selects
ten rows from the dataset, and the second line of
code randomly selects 15 rows (10% of the original
153 rows) from the dataset.
sample_n(airquality, size = 10)
sample_frac(airquality, size = 0.1)
DPLYR Functions
39
Pipe
The pipe operator in R,represented by %>% can
be used to chain code together. It is very useful
when you are performing several operations on
data, and don’t want to save the output at each
intermediate step.
flights_db %>% select(year:day, dep_delay, arr_delay)
flights_db %>% filter(dep_delay > 240)
flights_db %>% group_by(dest) %>% summarise(delay =
mean(dep_time))
Data Exploration Functions
40
Str Mean
class Median
head Range
tail Var
Dim Sd
Ncol Names
Nrow Levels
Summary Table
REFERENCES
41
https://courses.edx.org/courses/UTAustinX/UT.7.01
x/3T2014/56c5437b88fa43cf828bff5371c6a92
4/