0% found this document useful (0 votes)
12 views

MBA Sem 1 Unit 3 Fundamentals of R (1)

Uploaded by

Kunal Deore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

MBA Sem 1 Unit 3 Fundamentals of R (1)

Uploaded by

Kunal Deore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT 3

FUNDAMENTALS OF R
Installing R
2

 Open an internet browser and go to www.r-project.org.


 Click the "download R" link in the middle of the page under "Getting
Started."
 Select a CRAN location (a mirror site) and click the corresponding
link.
 Click on the "Download Rfor Windows" link at the top of the page.
 Click on the "install Rfor the first time" link at the top of the page.
 Click "Download R for Windows" and save the executable file
somewhere on your computer. Run the .exe file and follow the
installation instructions.
 Now that Ris installed, you need to download and install RStudio.
Install R Studio
3

 Go to www.rstudio.com and click on the "Download


RStudio" button.
 Click on "Download RStudio Desktop."
 Click on the version recommended for your system,
or the latest Windows version, and save the
executable file. Run the .exe file and follow the
installation instructions.
Reading and Writing to a File
4

 In R, we can read data from files stored outside the


R environment.
 We can also write data into files which will be
stored and accessed by the operating system.
 R can read and write into various file formats like
txt,csv, excel, spss, sas etc.
getwd() & setwd()
5

 getwd() - check which directory the Rworkspace is


pointing to.
 setwd() - set a new working directory
🞑# Get and print current working directory.
print(getwd())
🞑 # Set current working directory.

🞑 setwd("/web/com")

🞑 # Get and print current working directory.


print(getwd())
Example
6

 > getwd()
1 "C:/Users/sheena/Documents"
 > setwd("C:\Users\sheena\Desktop\NOTES")

Error: '\U' used without hex digits in character string


starting ""C:\U“
Solution : Use \ \ in place of \
 > setwd("C:\\Users\\sheena\\Desktop\\NOTES")

 > getwd()

[1] "C:/Users/sheena/Desktop/NOTES"
CSV File
7

 The csv file is a text file in which the values in the


columns are separated by a comma.
 You can create this file using windows notepad by
copying and pasting this data. Save the file
as input.csv using the save As All files(*.*) option in
notepad.
 id,name,salary,start_date,dept
🞑 1, Rick,623.3,2012-01-01,IT
🞑 2,Dan,515.2,2013-09-23,Operations
🞑 3,Michelle,611,2014-11-15,IT
🞑 4,Ryan,729,2014-05-11,HR
🞑 5,Gary,843.25,2015-03-27,Finance
CSV File
8

 read.csv() function to read a CSV file available in


your current working directory
🞑 data <- read.csv("input.csv“,header=TRUE,sep=“,”)
🞑 print(data)

or
🞑 View(data)
CSV File
9

 By default the read.csv() function gives the output


as a data frame.
🞑 data <- read.csv("input.csv")
🞑 print(is.data.frame(data))

🞑 print(ncol(data))

🞑 print(nrow(data))

 Once we read data in a data frame, we can apply


all the functions applicable to data frames.
Get the maximum salary
10

🞑# Create a data frame.


🞑 data <- read.csv("input.csv")

🞑 # Get the max salary from data frame.

🞑 sal <- max(data$salary)

🞑 print(sal)
Get the details of the person with max
11
salary
# Create a data frame.
🞑 data <- read.csv("input.csv")

# Get the max salary from data frame.


🞑 sal <- max(data$salary)

# Get the person detail having max salary.


🞑 retval <- subset(data, salary == max(salary))

🞑 print(retval)
Get all the people working in IT
12
department
# Create a data frame.
 data <- read.csv("input.csv")

 retval <- subset( data, dept == "IT")

 print(retval)
Get the persons in IT department
13
whose salary is greater than 600
# Create a data frame.
 data <- read.csv("input.csv")

 info <- subset(data, salary > 600 & dept == "IT")

 print(info)
Get the people who joined on or after
14
2014
# Create a data frame.
 data <- read.csv("input.csv")

 retval <- subset(data, as.Date(start_date) >


as.Date("2014-01-01"))
 print(retval)
Writing into a CSV File
15

# Create a data frame.


data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) >
as.Date("2014-01-01"))
# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
print(newdata)
Writing into a CSV File
16

# Create a data frame.


 data <- read.csv("input.csv")

 retval <- subset(data, as.Date(start_date) >


as.Date("2014-01-01"))
# Write filtered data into a new file.
write.csv(retval,"output.csv", row.names = FALSE)
 newdata <- read.csv("output.csv")

 print(newdata)
17 Excel File
Install Package
18

 install.packages("xlsx")
 library("xlsx")

 Or

 install.packages(“openxlsx”)
 library(openxlsx)
Reading the Excel File
19

 # Read the first worksheet in the file input.xlsx.


data <- read.xlsx("input.xlsx", sheetIndex = 1)
View(data)
or
 d2<-openxlsx::read.xlsx("order.xlsx")
 View(d2)
1
20 SPSS file
Install.packages(foreign)
library(foreign)
Read from SPSS file
21

 # reading SPSS files with `foreign`


 dataset =
read.spss("C:\\PathToFile\\MyDataFile.sav",
to.data.frame=TRUE)

 db = file.choose()
dataset = read.spss(db, to.data.frame=TRUE)
22 SAS files
Install.packages(“foreign”)
library(foreign)
or
Install.packages(“haven”)
library(haven)
Reading from SAS file
23

 # reading SAS files with `foreign`


read.ssd("path\to\your\data")

 # reading SAS files with `haven`


read_sas("path\to\your\data")

 # writing to Stata, SPSSor SASfiles with `foreign`


write.foreign (dataframe, datafile, codefile,
package = c("SPSS", "Stata", "SAS"), ...)
24 MySQL Files
install.packages("RMySQL")
Connecting R to MySql
25

 # Create a connection Object to MySQL database.


# We will connect to the sampel database named
"sakila" that comes with MySql installation.
 mysqlconnection = dbConnect(MySQL(), user =
'root', password = ‘admin', dbname = ‘mysql', host
= 'localhost')
 # List the tables available in this database.
dbListTables(mysqlconnection)
Querying the Tables
26

# Query the "actor" tables to get all the rows.


 result = dbSendQuery(mysqlconnection, "select *
from emp")
# Store the result in a R data frame object. n = 5 is
used to fetch first 5 rows.
 data.frame = fetch(result, n = 5)

 print(data.fame)
Query with Filter Clause
27

 result = dbSendQuery(mysqlconnection, "select *


from actor where last_name = 'TORN'")
# Fetch all the records(with n = -1) and store it as a
data frame.
 data.frame = fetch(result, n = -1)

 print(data)
Updating Rows in the Tables
28

 dbSendQuery(mysqlconnection, "update mtcars set


disp = 168.5 where hp = 110")
Inserting Data into the Tables
29

 dbSendQuery
 (mysqlconnection, "insert into mtcars(mpg, cyl, disp,
hp, drat, wt, qsec, vs, am, gear, carb) values
 ('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9,
2.875, 17.02, 0, 1, 4, 4)" )
30 RDBMS usind ODBC
install.packages(“RODBC")
ODBC Setting in Windows
31

 Install SQL Server Express


 Set ODBC Data Source in Windows
🞑 Driver = "SQL Server",
🞑 Server = "localhost\\SQLEXPRESS",

🞑 Database = “mydatabase",

🞑 Trusted_Connection = "True”

 Create DSN
Reading from SQL Server
32
using ODBC
#Establishing ODBC Connection
 con <- odbcConnect("DSN name")

 sqlTables(conn)

#Executing SQL SELECT Query


 b1<-sqlQuery(conn,"select * from dbo.EMP")

# Reading SQL Table


 res <- sqlFetch(conn, "dbo.EMP")

 Or

 res <- sqlFetch(conn, "dbo.EMP", max = 3)


33 DPLYR PACKAGE
dplyr is a package for data manipulation, written
and maintained by Hadley Wickham.
Install.packages(“dplyr”)
library(dplyr)
DPLYR Functions
34

 Select
🞑 Select(d1,country)

🞑 Select(d1, -order.date)

🞑 Select(d1,order.id:aging,states)

🞑 Select(d1,starts_with(“c”))

🞑 Select(d1,ends_with(“c”))

🞑 Select(d1,contains(“c”))

 distinct
DPLYR Functions
35

 Filter
The filter function will return all the rows that satisfy
a following condition.
🞑 filter(airquality, Temp > 70)
🞑 filter(airquality, Temp > 80 & Month > 5)

 Mutate
Mutate is used to add new variables to the data.
🞑 mutate(airquality, TempInC = (Temp - 32) * 5 / 9)
DPLYR Functions
36

 Summarise
The summarise function is used to summarise multiple
values into a single value.
🞑 summarise(airquality, mean(Temp, na.rm = TRUE))
 na.rm = TRUE will remove all NA values
 Group By
The group_by function is used to group data by one
or more variables.
🞑 summarise(group_by(airquality, Month), mean(Temp,
na.rm = TRUE))
DPLYR Functions
37

 Count
The count function tallies observations based on a group. It is
slightly similar to the table function in the base package. For
example:
🞑 count(airquality, Month)
 Arrange
The arrange function is used to arrange rows by variables.
Currently, the airquality dataset is arranged based on
Month, and then Day. We can use the arrange function to
arrange the rows in the descending order of Month, and
then in the ascending order of Day.
🞑 arrange(airquality, desc(Month), Day)
DPLYR Functions
38

 Sample
Thesample function is used to select random rows
from a table. The first line of code randomly selects
ten rows from the dataset, and the second line of
code randomly selects 15 rows (10% of the original
153 rows) from the dataset.
 sample_n(airquality, size = 10)
sample_frac(airquality, size = 0.1)
DPLYR Functions
39

 Pipe
The pipe operator in R,represented by %>% can
be used to chain code together. It is very useful
when you are performing several operations on
data, and don’t want to save the output at each
intermediate step.
 flights_db %>% select(year:day, dep_delay, arr_delay)
 flights_db %>% filter(dep_delay > 240)
 flights_db %>% group_by(dest) %>% summarise(delay =
mean(dep_time))
Data Exploration Functions
40

 Str  Mean
 class  Median
 head  Range
 tail  Var
 Dim  Sd
 Ncol  Names
 Nrow  Levels
 Summary  Table
REFERENCES
41

 https://courses.edx.org/courses/UTAustinX/UT.7.01
x/3T2014/56c5437b88fa43cf828bff5371c6a92
4/

You might also like