Data Management II
Data Management II
• Sorting data is one of the common activities in preparing data for analysis
• Sorting is storage of data in sorted order, it can be in ascending or descending order.
• We will be exploring all the ways in which sorting can be done.
salary<-read.csv(file.choose())
Use attach function
Data Sorting in R (Ascending)
• Sort salary by ba in ascending order
• order() sorts in ascending order by default
> ba_sorted<-salary[order(ba), ]
> head(ba_sorted)
4
Data Sorting in R (Descending)
• Sort salary by ba in descending order
> ba_sorted_2<-salary[order(-ba), ]
> head(ba_sorted_2)
• The ‘- ‘ sign sorts numeric columns in descending order. Alternatively you can use decreasing=TRUE
Data Sorting in R
(Using Factor Variable)
• Sort data by column with characters / factors
#Sort salary by Grade
> gr_sorted<-salary[order(Grade), ]
> head(gr_sorted)
7
Sorting Data by Multiple Variables
• Sort data by giving multiple columns; one column with characters / factors and one with
numerals
#Sort salary_data by Grade and ba
• Here, data is first sorted in increasing order of Grade then by increasing order of ba within Grade
Merging by Variables
#Import following 2 data sets
sal_data
9
Types of Joins
LEFT RIGHT
JOIN JOIN
INNER OUTER
JOIN JOIN
Outer Joins
• Outer Join includes all employee ID’s from both data sets
12
Left Join
• Left Join includes all employee ID’s from first data set
> leftjoin<-merge(sal_data,bonus_data,
by=c("Employee_ID"), all.x=TRUE)
> leftjoin
• Right Join includes all employee ID’s from second data set
> rightjoin<-merge(sal_data,bonus_data,
by=c("Employee_ID"), all.y=TRUE)
> rightjoin
Location ba
1 DELHI 19430.29
2 MUMBAI 15037.11
15