0% found this document useful (0 votes)

44 views

Data Management II

The document discusses various data management techniques in R including sorting data, merging datasets, and using aggregate functions. It shows how to sort a dataset by columns in ascending and descending order, including numeric, character, and factor variables. It also demonstrates different types of joins to merge datasets including inner, outer, left, and right joins. The aggregate function is used to calculate the mean of a variable grouped by a factor variable.

Uploaded by

Abdinasir Ahmed Mohamed

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Data Management II

Uploaded by

Abdinasir Ahmed Mohamed

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Management -II

What will we learn

• Sorting of data set

• Merging data sets

• Aggregating to get sum

Data Sorting in R

• Sorting data is one of the common activities in preparing data for analysis
• Sorting is storage of data in sorted order, it can be in ascending or descending order.
• We will be exploring all the ways in which sorting can be done.

#Import and attach basic_salary2 data from day 2 folder

salary<-read.csv(file.choose())
Use attach function
Data Sorting in R (Ascending)
• Sort salary by ba in ascending order
• order() sorts in ascending order by default

> ba_sorted<-salary[order(ba), ]
> head(ba_sorted)

First_Name Last_Name Grade Location Function ba ms

37 Archa Narvekar GR2 MUMBAI TECHNICAL 10940 11160
32 Anup Save GR2 MUMBAI SALES 11960 7880
33 Yogesh Lonkar GR2 MUMBAI TECHNICAL 12390 6630
38 Shiva Jathar GR2 MUMBAI FINANCE 12860 10940
41 Ketan Kharkar GR2 MUMBAI SALES 13140 9800
34 Sagar Chavan GR2 MUMBAI FINANCE 13390 6700

4
Data Sorting in R (Descending)
• Sort salary by ba in descending order
> ba_sorted_2<-salary[order(-ba), ]
> head(ba_sorted_2)

First_Name Last_Name Grade Location Function ba ms

12 Yogita Raje GR1 DELHI SALES 29080 8795
11 Raj Mohite GR1 DELHI FINANCE 26080 16970
10 Hameed Singh GR1 DELHI SALES 23720 15120
4 Priya Jain GR1 DELHI SALES 23280 13490
6 Mahesh Rane GR1 DELHI TECHNICAL 23160 14200
9 Nishi Kulkarni GR1 <NA> SALES 22620 16150

• The ‘- ‘ sign sorts numeric columns in descending order. Alternatively you can use decreasing=TRUE
Data Sorting in R
(Using Factor Variable)
• Sort data by column with characters / factors
#Sort salary by Grade

> gr_sorted<-salary[order(Grade), ]
> head(gr_sorted)

First_Name Last_Name Grade Location Function ba ms

1 Mahesh Joshi GR1 DELHI SALES 17990 16070
2 Rajesh Kolte GR1 DELHI FINANCE 19250 14960
3 Neha Rao GR1 DELHI FINANCE 19235 15200
4 Priya Jain GR1 DELHI SALES 23280 13490
5 Sneha Joshi GR1 DELHI FINANCE 20660 15660
6 Mahesh Rane GR1 DELHI TECHNICAL 23160 14200

• Note that by default order() sorts in ascending order

Data Sorting in R
•
(Using
Sort data by column with characters / factors
Factor Variable)
#Sort salary by Grade in descending order

> gr_sorted_2<-salary[order(Grade, decreasing=TRUE), ]

> head(gr_sorted_2)

First_Name Last_Name Grade Location Function ba ms

25 Priya Mittal GR2 DELHI TECHNICAL 15000 10680
26 Naresh Sinha GR2 DELHI TECHNICAL 13810 11540
27 Jivesh Shah GR2 <NA> FINANCE 16000 13730
28 Jigar Shah GR2 DELHI FINANCE 16230 NA
29 Gaurav Singh GR2 DELHI SALES 13760 13220
30 Amit Mehta GR2 DELHI TECHNICAL 13660 6840

• For reversing the sorting order for factor variables, include

logical argument decreasing=TRUE

7
Sorting Data by Multiple Variables
• Sort data by giving multiple columns; one column with characters / factors and one with
numerals
#Sort salary_data by Grade and ba

> grba_sorted<-salary[order(Grade, ba), ]

> head(grba_sorted)

First_Name Last_Name Grade Location Function ba ms

13 Anjali Sonar GR1 MUMBAI <NA> 14410 10450
15 Rahul Potdar GR1 MUMBAI SALES 15125 NA
14 Bipin Bhide GR1 MUMBAI FINANCE 15230 11010
17 Mangesh Oak GR1 MUMBAI SALES 15800 12420
18 Anand Soman GR1 <NA> FINANCE 16540 12780
19 Malhar Jadhav GR1 MUMBAI TECHNICAL 17240 13220

• Here, data is first sorted in increasing order of Grade then by increasing order of ba within Grade
Merging by Variables
#Import following 2 data sets

sal_data

Employee_ID First_Name Last_Name Basic_Salary

bonus_data
Employee_ID Bonus
1 E-1001 Mahesh Joshi 16070 1 E-1001 12050
2 E-1002 Rajesh Kolte 14960 2 E-1003 11400
3 E-1004 Priya Jain 13490
3 E-1004 10110
4 E-1006 10650
4 E-1005 Sneha Joshi 15660 5 E-1008 11910
5 E-1007 Ram Kanade 15850 6 E-1010 11340
6 E-1008 Nishi Honrao 15880

9
Types of Joins

LEFT RIGHT
JOIN JOIN

INNER OUTER
JOIN JOIN
Outer Joins
• Outer Join includes all employee ID’s from both data sets

> outerjoin<- merge(sal_data,bonus_data,

by=c("Employee_ID"), all=TRUE)
> outerjoin
Employee_ID First_Name Last_Name Basic_Salary Bonus
1 E-1001 Mahesh Joshi 16070 12050
2 E-1002 Rajesh Kolte 14960 NA
3 E-1004 Priya Jain 13490 10110
4 E-1005 Sneha Joshi 15660 NA
5 E-1007 Ram Kanade 15850 NA
6 E-1008 Nishi Honrao 15880 11910
7 E-1009 Hameed Singh 15120 NA
8 E-1003 <NA> <NA> NA 11400
9 E-1006 <NA> <NA> NA 10650
10 E-1010 <NA> <NA> NA 11340
11
Inner Join

• Inner Join includes employee ID only if present in both data sets

> innerjoin<- merge(sal_data,bonus_data,

by=c("Employee_ID"))
> innerjoin

Employee_ID First_Name Last_Name Basic_Salary Bonus

1 E-1001 Mahesh Joshi 16070 12050
2 E-1004 Priya Jain 13490 10110
3 E-1008 Nishi Honrao 15880 11910

12
Left Join

• Left Join includes all employee ID’s from first data set

> leftjoin<-merge(sal_data,bonus_data,
by=c("Employee_ID"), all.x=TRUE)
> leftjoin

Employee_ID First_Name Last_Name Basic_Salary Bonus

1 E-1001 Mahesh Joshi 16070 12050
2 E-1002 Rajesh Kolte 14960 NA
3 E-1004 Priya Jain 13490 10110
4 E-1005 Sneha Joshi 15660 NA
5 E-1007 Ram Kanade 15850 NA
6 E-1008 Nishi Honrao 15880 11910
7 E-1009 Hameed Singh 15120 NA
13
Right Join

• Right Join includes all employee ID’s from second data set

> rightjoin<-merge(sal_data,bonus_data,
by=c("Employee_ID"), all.y=TRUE)
> rightjoin

Employee_ID First_Name Last_Name Basic_Salary Bonus

1 E-1001 Mahesh Joshi 16070 12050
2 E-1004 Priya Jain 13490 10110
3 E-1008 Nishi Honrao 15880 11910
4 E-1003 <NA> <NA> NA 11400
5 E-1006 <NA> <NA> NA 10650
6 E-1010 <NA> <NA> NA 11340
14
Aggregate Function≈

#To calculate mean for variable ‘ba’ by Location variable

A<-aggregate(ba ~ Location, data = salary, FUN = mean )

Location ba
1 DELHI 19430.29
2 MUMBAI 15037.11

#Aggregate function by default ignores the missing data values.

Therefore, na.rm=TRUE is not required in mean function.

Window Functions in SQL (Slides)
No ratings yet
Window Functions in SQL (Slides)
24 pages
SQL Project
100% (1)
SQL Project
4 pages
Cost Lab 1
No ratings yet
Cost Lab 1
13 pages
SQL
No ratings yet
SQL
4 pages
Sanjeet DBMS
No ratings yet
Sanjeet DBMS
10 pages
Assignment No 2
No ratings yet
Assignment No 2
4 pages
MYSQL PRACTICAL FILE ASSIGNMENT
No ratings yet
MYSQL PRACTICAL FILE ASSIGNMENT
6 pages
Excel Assignment 2
No ratings yet
Excel Assignment 2
2 pages
RDBMS Lab Questions 24
No ratings yet
RDBMS Lab Questions 24
5 pages
35000122056_Sushobhan Dash (4) (1)
No ratings yet
35000122056_Sushobhan Dash (4) (1)
7 pages
Job Description - Management Trainee US Recruiter
No ratings yet
Job Description - Management Trainee US Recruiter
2 pages
FUNCTIONS IN SQL
No ratings yet
FUNCTIONS IN SQL
6 pages
SQL Exercise
No ratings yet
SQL Exercise
6 pages
IP practical
No ratings yet
IP practical
3 pages
Assignment 7.
No ratings yet
Assignment 7.
16 pages
SQL Assignmnet
No ratings yet
SQL Assignmnet
4 pages
Xii Sample Practical File 2024
No ratings yet
Xii Sample Practical File 2024
30 pages
Dbms Program
No ratings yet
Dbms Program
37 pages
Karan 1
No ratings yet
Karan 1
6 pages
IP Revision Crisp
No ratings yet
IP Revision Crisp
20 pages
Increment Letter
No ratings yet
Increment Letter
2 pages
DML_Commands_SQL
No ratings yet
DML_Commands_SQL
2 pages
SQLassignment2[1]
No ratings yet
SQLassignment2[1]
15 pages
Answer for Exercise 1 - SQL Fundamentals (SELECT & Filtering Statements)
No ratings yet
Answer for Exercise 1 - SQL Fundamentals (SELECT & Filtering Statements)
9 pages
tp3
No ratings yet
tp3
27 pages
Practical Question
No ratings yet
Practical Question
3 pages
Executive Salary Jan'09
No ratings yet
Executive Salary Jan'09
6 pages
Unit-2 (2)
No ratings yet
Unit-2 (2)
86 pages
Ashish - Kcpl@Rediffmail - Co Mmr. S Kchitalechief Exc - Software DVN
No ratings yet
Ashish - Kcpl@Rediffmail - Co Mmr. S Kchitalechief Exc - Software DVN
9 pages
5
No ratings yet
5
3 pages
Gmail - New jobs posted from jobs.heromotocorp.com
No ratings yet
Gmail - New jobs posted from jobs.heromotocorp.com
1 page
Adbms Lab
No ratings yet
Adbms Lab
80 pages
Summer Vacation Assignment Xii CS 2024
No ratings yet
Summer Vacation Assignment Xii CS 2024
2 pages
SQL-Basic queries
No ratings yet
SQL-Basic queries
25 pages
Nikhil
No ratings yet
Nikhil
6 pages
III Sem (CA) RDBMS Lab Qoestion Bank 2019-20
No ratings yet
III Sem (CA) RDBMS Lab Qoestion Bank 2019-20
5 pages
Quiz 1
No ratings yet
Quiz 1
2 pages
My Document
No ratings yet
My Document
11 pages
Book2
No ratings yet
Book2
22 pages
SQL LeetCode
No ratings yet
SQL LeetCode
105 pages
Analyst Practice Ques
No ratings yet
Analyst Practice Ques
107 pages
Prog grd12
No ratings yet
Prog grd12
1 page
Gmail
No ratings yet
Gmail
1 page
d
No ratings yet
d
6 pages
SQL Presentation2
No ratings yet
SQL Presentation2
24 pages
part 7-Manufacture company Advance inventory
No ratings yet
part 7-Manufacture company Advance inventory
5 pages
Maharishi International Residential School 083 Computer Science - Worksheet On SQL Commands
No ratings yet
Maharishi International Residential School 083 Computer Science - Worksheet On SQL Commands
3 pages
SQL Final
No ratings yet
SQL Final
10 pages
03 Writing Basic SQL Select Statement
No ratings yet
03 Writing Basic SQL Select Statement
7 pages
Chap-5
No ratings yet
Chap-5
17 pages
Central Tendency
No ratings yet
Central Tendency
3 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
17 pages
SQL_Queries_and_Answers
No ratings yet
SQL_Queries_and_Answers
2 pages
DBML 3
No ratings yet
DBML 3
6 pages
Untitled document (1)
No ratings yet
Untitled document (1)
2 pages
Document From Harshanand
No ratings yet
Document From Harshanand
4 pages
DBMS_Output_atharv
No ratings yet
DBMS_Output_atharv
15 pages
Emp - No Name DOB Department Title
No ratings yet
Emp - No Name DOB Department Title
4 pages
Names of The Recruiters
No ratings yet
Names of The Recruiters
7 pages
Dhat-Al-Riqa Invsasion
No ratings yet
Dhat-Al-Riqa Invsasion
9 pages
Chapter Four-Hunain
No ratings yet
Chapter Four-Hunain
15 pages
Cover Letter, CV 202
No ratings yet
Cover Letter, CV 202
4 pages
CV 2023062618094238
No ratings yet
CV 2023062618094238
6 pages
UNIT 2, Atmosphere
No ratings yet
UNIT 2, Atmosphere
37 pages
Early Life and Childhood of Mohammad - Saw-1
No ratings yet
Early Life and Childhood of Mohammad - Saw-1
38 pages
Study Deisgn Part 1
No ratings yet
Study Deisgn Part 1
30 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Assignment f4
No ratings yet
Assignment f4
26 pages
Study Design Experiments
No ratings yet
Study Design Experiments
10 pages
Resume Format Example With Contact Details
No ratings yet
Resume Format Example With Contact Details
2 pages
Diseño de Un Sistema Fotovoltaico Universidad de Ecuador
No ratings yet
Diseño de Un Sistema Fotovoltaico Universidad de Ecuador
17 pages
Seapath 300 User Manual - 345619D
No ratings yet
Seapath 300 User Manual - 345619D
104 pages
Random Modulation A Review
No ratings yet
Random Modulation A Review
10 pages
English Language - 1st Term 2
No ratings yet
English Language - 1st Term 2
1 page
ASSIGNMENT-9
No ratings yet
ASSIGNMENT-9
17 pages
GHC PX-31 Adjustable AC-CDI CDI Specs English Version
No ratings yet
GHC PX-31 Adjustable AC-CDI CDI Specs English Version
2 pages
M. Stratejik SDM Dian Dan Al-Maurits
No ratings yet
M. Stratejik SDM Dian Dan Al-Maurits
45 pages
Nitrex - DoC
No ratings yet
Nitrex - DoC
1 page
Zion: 1.1 Vulnhub Walkthrough: Penetration Testing Methodology
No ratings yet
Zion: 1.1 Vulnhub Walkthrough: Penetration Testing Methodology
15 pages
Study On Online Payment Applications
No ratings yet
Study On Online Payment Applications
38 pages
Design of Vertical Stirrup
No ratings yet
Design of Vertical Stirrup
3 pages
Research On The Use Technology of Nurse Education
No ratings yet
Research On The Use Technology of Nurse Education
16 pages
Architect Solution Assignment
No ratings yet
Architect Solution Assignment
1 page
Lecture 11 DISCOURSE-AND-DIALOG
No ratings yet
Lecture 11 DISCOURSE-AND-DIALOG
21 pages
MCSA Explained
No ratings yet
MCSA Explained
17 pages
Mirr T-Cross 4T Api SM CF Sae 15W-40 TDS
No ratings yet
Mirr T-Cross 4T Api SM CF Sae 15W-40 TDS
2 pages
Attending Companies
No ratings yet
Attending Companies
15 pages
Settings Provider
No ratings yet
Settings Provider
22 pages
Van Nap LT06
No ratings yet
Van Nap LT06
3 pages
Flyer - Mobility Grant
No ratings yet
Flyer - Mobility Grant
1 page
Vendor Document / Document Front Sheet: Diesel Engine Generator Set Overall Piping & Instrumentation Diagram
100% (2)
Vendor Document / Document Front Sheet: Diesel Engine Generator Set Overall Piping & Instrumentation Diagram
11 pages
New Media Technologies and Society: A Study On The Impact of New Media Technology On Interaction Patterns of Youth
No ratings yet
New Media Technologies and Society: A Study On The Impact of New Media Technology On Interaction Patterns of Youth
13 pages
Mathematical Attack On RSA
No ratings yet
Mathematical Attack On RSA
5 pages
Katalog Produk Aneka Rack Wallmount Rack
No ratings yet
Katalog Produk Aneka Rack Wallmount Rack
15 pages
Anexa 2 - Configuratie Ingenia 3T PDF
100% (1)
Anexa 2 - Configuratie Ingenia 3T PDF
33 pages
C029 Supply Chain Management and Logistics
No ratings yet
C029 Supply Chain Management and Logistics
16 pages
SDLA-312 Sec Dev Lifecycle Assess (v6 - 3)
No ratings yet
SDLA-312 Sec Dev Lifecycle Assess (v6 - 3)
31 pages
Accounting 2016 26th Edition Bieg Test Bank
No ratings yet
Accounting 2016 26th Edition Bieg Test Bank
34 pages
Among Us 16x9
No ratings yet
Among Us 16x9
19 pages
Fhwa Ict 18 012
No ratings yet
Fhwa Ict 18 012
209 pages

Data Management II

Uploaded by

Data Management II

Uploaded by

Data Management -II

What will we learn

• Sorting of data set

• Merging data sets

• Aggregating to get sum

#Import and attach basic_salary2 data from day 2 folder

First_Name Last_Name Grade Location Function ba ms

First_Name Last_Name Grade Location Function ba ms

First_Name Last_Name Grade Location Function ba ms

• Note that by default order() sorts in ascending order

> gr_sorted_2<-salary[order(Grade, decreasing=TRUE), ]

First_Name Last_Name Grade Location Function ba ms

• For reversing the sorting order for factor variables, include

> grba_sorted<-salary[order(Grade, ba), ]

First_Name Last_Name Grade Location Function ba ms

Employee_ID First_Name Last_Name Basic_Salary

> outerjoin<- merge(sal_data,bonus_data,

• Inner Join includes employee ID only if present in both data sets

> innerjoin<- merge(sal_data,bonus_data,

Employee_ID First_Name Last_Name Basic_Salary Bonus

Employee_ID First_Name Last_Name Basic_Salary Bonus

Employee_ID First_Name Last_Name Basic_Salary Bonus

#To calculate mean for variable ‘ba’ by Location variable

A<-aggregate(ba ~ Location, data = salary, FUN = mean )

#Aggregate function by default ignores the missing data values.

Therefore, na.rm=TRUE is not required in mean function.

You might also like