0% found this document useful (0 votes)
54 views16 pages

(Data Pre-Processing & Visualization) : (Mid-Term Assignment)

The document provides instructions to analyze building permit data from a CSV file using R. It includes questions to: 1) Convert date columns to proper date formats. 2) Extract records by date criteria, such as before/after a given date. 3) Find the oldest and newest permit dates. 4) Convert a block number column to numeric and extract by block. 5) Extract records between two date criteria for permit and completion dates. The responses provide the R code to perform the requested data transformations and extractions to answer the questions.

Uploaded by

Asutosh Mahala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views16 pages

(Data Pre-Processing & Visualization) : (Mid-Term Assignment)

The document provides instructions to analyze building permit data from a CSV file using R. It includes questions to: 1) Convert date columns to proper date formats. 2) Extract records by date criteria, such as before/after a given date. 3) Find the oldest and newest permit dates. 4) Convert a block number column to numeric and extract by block. 5) Extract records between two date criteria for permit and completion dates. The responses provide the R code to perform the requested data transformations and extractions to answer the questions.

Uploaded by

Asutosh Mahala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

qwertyuiopasdfghjklzxcvbnmqw

ertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopa
[Data Pre-Processing & Visualization]

sdfghjklzxcvbnmqwertyuiopasdf
[Mid-Term Assignment]

ghjklzxcvbnmqwertyuiopasdfghj
SHIVAM MISHRA – 21PGDM074

klzxcvbnmqwertyuiopasdfghjklz
xcvbnmqwertyuiopasdfghjklzxcv
bnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwe
rtyuiopasdfghjklzxcvbnmqwerty
uiopasdfghjklzxcvbnmqwertyuio
pasdfghjklzxcvbnmqwertyuiopas
dfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmqwertyuiopasdfghjk
Use the mtcars dataset (from R environment) to answer the

following questions below:

QUESTION -1. Why does the command mtcars[1:20] return an

error? How does it differ from the similar command

mtcars[1:20,]? [2 marks]

ANSWER – 1.

The command mtcars[1:20] is a command to show the first 20 columns of the


dataset. But in our case the number of columns in the mtcars dataset is only 11, so
it returns an error of undefined columns selected.

we can mtcars[1:20,] to indicate that 20 rows are selected instesd of 20 coloumns


which is not even there in the data.
1. QUESTION – 2. Fix the following errors (if any) and show the

output of the codes below:

i.) mtcars[mtcars$cyl = 4, ]

ANSWER -To fixed this error we can use either subset function like this –
subset(mtcars,cyl==4) or mtcars[mtcars$cyl==4,] this can be also be used as

it uses == which is used as comparison operator


ii.) mtcars[-1:4]

ANSWER – To fix this error we will either replace – sign as inside the brackets
positions are saved of the row and coloumns so it can not be negative . correct code

will be mtcars[1:4]
iii.) mtcars[mtcars$cyl <= 5]

ANSWER – the purpose of this function is to write straight away all the data
with cyl<=5 for which we can write a subset function and can run the code
iv) mtcars[mtcars$cyl == 4 | 6, ]

ANSWER - Now, the code mtcars[mtcars$cyl == 4 | 6, ] doesn’t give a correct


expression of what it wants, which makes it show the whole of the data as the

output instead of an error. To correct this we have to change the 6 to mtcars$cyl

== 6, which will make the code as follows: mtcars[mtcars$cyl == 4 | mtcars$cyl ==

6,]. Thus, this will give the output as only the rows of that which have the cyl

column as 4 or 6.
QUESTION – 3 Rename the cars in mtcars file that have names as

Merc to Mercedes [4 marks]

ANSWER - we have to run the following codes so that the name of the rows are
changed from Merc to Mercedes . rownames(mtcars) x<-rownames(mtcars)
x=gsub("Merc","Mercedes",x) print(x) rownames(mtcars)<-x

First of all we’ll save the names of the rows in a variable x and then we will run the

gsub function to change the names of rows from Merc To Mercedes and then again

we will push the changed names of the rows into the table
QUESTION – 4 Extract car records from mtcars with cylinder

greater than 4.00 and weighs less than mean weight. (File) [4 marks]

ANSWER - -> mean(mtcars$wt)

summary(mtcars)

subset(mtcars,cyl>4)

r<- subset.data.frame(mtcars, wt<mean(wt) & cyl>4)

Running the above code we will be able to extract the date with wt less than mean and cyinder
greater than 4.
Use the Building_Permits.csv file (from Google Classroom) to answer

the following questions:

1. Convert the Permit Creation Date column in the original dataset

from character to proper date format. Use the following code:

as.Date(x, format = "%m/%d/%Y") where “x” is the

variable/column to convert.

i. Now extract the building records with permit date before 1

January 2013. [4 marks]

ii. What are the oldest and newest permit date records for the

buildings? [2 marks]

ANSWER - import the data file in RStudio. then, convert that permit creation
date column from character to date using the code: Building_Permits$`Permit

Creation Date`<-as.Date(Building_Permits$`Permit Creation Date`, format = "%m/

%d/%Y")

i) - The building records with permit date before 1 January 2013 can be

extracted using the code: bp_01<-subset(Building_Permits, `Permit Creation

Date` < "2013-01-01")


ANSWER – ii) We can use the min and max function in RStudio to find the oldest
and newest permit date for the building.

For oldest: min(Building_Permits$`Permit Creation Date`) – The date is “2012-03-

28”.

For newest: max(Building_Permits$`Permit Creation Date`) – The date is “2018-

02-23”.
QUESTION – 6 Convert the Block column in the original dataset from

character to proper number format.

Use the following code: as.numeric(x), where “x” is the

variable/column to convert.

i. Now extract the building records with permit date after 1

January 2015. [4 marks]

ii. What are the oldest and newest permit date records for the

buildings in Block 326?

[2 marks]

Now, convert the Completed Date column in the original dataset

from character to proper date format. Use the following code:

as.Date(x, format = "%m/%d/%Y") where “x” is the

variable/column to convert.

iii. Now extract the building records with permit date after 1

January 2015 and completed after 1 January 2018.

[4 marks]

ANSWER - convert the permit creation date column from character to date
using the code: Building_Permits$Block<-as.numeric(Building_Permits$Block)
ANSWER – i) To get the building records with permit date before 1 january
2015, we use:

bp_02<-subset(Building_Permits, `Permit Creation Date` < "2015-01-01")

With this we get there are 71936 records.


i. ANSWER ii) - use the min and max function in RStudio to find the
oldest and newest permit date for the building. But we also need to know

the dates in accordance with the block 326. Hence, first we make a data

of all the records of Block 326 with this code:

bp_03<-subset(Building_Permits, Block == 326)

then, we use this data table in the min and max function to get the dates.

For oldest: min(bp_03$`Permit Creation Date`) – The date is “2017-01-13”.

For newest: max(bp_03$`Permit Creation Date`) – The date is “2018-01-12”.


ANSWER iii) – convert the permit creation date column from character to date
using the code: Building_Permits$`Completed Date`<-
as.Date(Building_Permits$`Completed Date`, format = "%m/%d/%Y")

To extract the building records between Permit Creation Date after 1 January

2015 and Completed Date after 1 January 2018 we use the code:

bp_04<-subset(Building_Permits, `Permit Creation Date` > "2015-01-01" &

`Completed Date` > "2018-01-01")

see above the data having 3130 records.


These are same records but with limited columns for better viewing of the data, as

you can see there are 6 columns with the same 3.130 rows.

You might also like