0% found this document useful (0 votes)
2 views

Practical 1_Data Frame Manipulation_072502

This document is a practical guide for a course on Spatial Data Modeling and Analysis using R, focusing on data frame manipulation and cleaning. It covers the basics of R functions, data importation, and the use of the dplyr package for data manipulation tasks. The document includes various tasks for students to practice creating vectors, matrices, and importing datasets while emphasizing the importance of data cleaning in the analysis process.

Uploaded by

atomicmdadis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Practical 1_Data Frame Manipulation_072502

This document is a practical guide for a course on Spatial Data Modeling and Analysis using R, focusing on data frame manipulation and cleaning. It covers the basics of R functions, data importation, and the use of the dplyr package for data manipulation tasks. The document includes various tasks for students to practice creating vectors, matrices, and importing datasets while emphasizing the importance of data cleaning in the analysis process.

Uploaded by

atomicmdadis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

SCHOOL OF EARTH SCIENCES, REAL ESTATE,

BUSINESS AND INFORMATICS


DEPARTMENT OF GEOSPATIAL SCIENCES AND TECHNOLOGY
(GST)
2024/2025

GS 227 Spatial Data Modeling and Analysis

Practical 1: Introduction R Basics-Data Frame


Manipulation

Instructor:

Dr. Msusa

Mr. Wangabo

.
Working with Functions

1. Introduction

R is a program designed for statistical computing and graphical displays of data. There are
hundreds of built in functions available at your disposal. Some functions are available as
standard, and others are packaged into libraries which you need to install separately. For the
purpose of this exercise let us do a basic exercise to understand what a function is. Some
symbos are commonly used;
• The <- is the assignment operator, which is equal to ‘=’
• The # indicates a comment. You can put whatever else you’d like after this, but on the
same line as the # . R will not evaluate it. The comment in R appears in green color.
• In a code you will also see () , [] , and {}.
• The () indicates a function (almost always),
• The [] indicates indexing (grabbing values by the location in a vector, matrix, etc.),
• The {} groups code that is meant to be run together and is usually used when
programming functions in R

Further, In this third exercise we are going to continue working on manipulating and cleaning
up our data frames. We are spending some time on this because, in my experience, most data
analysis and statistics classes seem to assume that 95% of the time spent working with data is
on the analysis and interpretation of that analysis and little time is spent getting data ready to
analyze.

However, in reality, the most time is spent on cleaning up data and less time on the analysis.
We will just be scratching the surface of the many ways you can work with data in R. We will
show the basics of subsetting, merging, modifying, and sumarizing data and our examples will
all use dplyr package. There are many ways to do this type of work in R, many of which are
available from base R, but I heard from many focusing on one way to do this is best, so dplyr it

This session is intended to introduce you to some features and of R environment by using
them. In this exercise you will learn some ways to exprole and manipulate data frame in R.
After completing this exercise you should be able to

i) Use various symbols and functions in R environment


ii) Read data from files/Import your data in R
iii) Save data into a file a file
iv) Cleaning data, selecting variable and subseting data

2. Setting a Working

Before doing anything R, create a habit of setting a working directory. To set a working
directory, in a munu bar go to Session>Set Working Directory

Install packages readr, readxl, dplyr and tidyr

1
3. Functions

Task 1: Create a vector called v containing a random sample of size 10 from a uniform
distribution. Find the average of this sample using the function mean.

The function runif generates random values from a uniform distribution. The function requires
certain bits of information for it to operate. The bits of information required are specified by the
arguments of the function. Functions can have default settings for some or all of their
arguments.

runif requires n (the sample size) to be specified but has default settings for its other arguments
min and max (defuning the minimum and maximum values to randomly sample between).
Notice that the vector v has 10 elements in it. These 10 elements correspond to each of the ten
random samples of real values between the default limits of 0 and 1. The average of such a
random sample, which we calculated using the mean function, should tend towards to 0:5

#you can also use function print() and show() to display results
print(v)
show(v)

Further, evaluate the following functions and check the results in the console:

2
Task 2: Now use the functions help and args to see how the runif and mean functions
work. Make sure you understand how to use the various arguments of the functions runif
and mean. Try increasing and decreasing the sample size of your random sample and
try sampling from a different range of values (i.e. instead of using the default values of
min and max, set your own values).

Notice that there is only one argument listed when typing args(mean). R displays x and then
three dots to indicate that there are further arguments (other than x) that can be secified to this
function. Use the help main page on (mean) to find out what these further arguments are.
Notice that the resulting output from the useage of the args function below returns NULL. This
is because there this no numerical result from these commands, hence null output.

3
#you can save the data in a file called Test

Note that, in the above functions there some Quotes vs no Quotes around arguments in a
function. General rule is that no quotes, are used only when referring to an object that currently
exists in a workspce. Quotes are used in all other cases. The objects a and d are not quoted
because they are objects in the workspace. File is an argument of save function and arguments
are never quoted.

Task 3: Vectors: Create a vector m containing the numbers 2, 4, 3, 1, 7. Confirm the


object is of type vector and find the length of m. Use the sub-setting operator in R to
access the 2nd element in this vector. Practice subsetting different elements of the
vector. Reverse the order of the elements in m. Create another vector called n with
elements a, b and c

4
Task 4 : Create a matrix with three rows and four columns with elements ranging between 1 and
12. Practice sub-setting elements of the matrix (i.e. get the element in row 2 and column 4; try
getting only the elements from column 1 and then only the elements from row 3).

5
Try an alternative way to set up matrices indicated below:

This creates a matrix with values between 1 and 25, split into 5 rows and 5 columns
.

Task 5: List: As above in the previous exercises, create a matrix with three rows and four columns
with elements ranging between 1 and 12. Create a vector containing the same elements. Now make a
list of the objects, put them in a list arrangement. Try extracting various objects from the list
separately. Then try to get an element from the matrix in this list by subsetting a single row and
column.
Notice the different ways to extract the objects from within a list. Subsetting a list using the square
brackets operators gives a subset of the list but doesn't extract a given object.

You must use double (not single) square brackets for extracting an object from a list. The objects in the
list are arranged in one dimension (similar to elements in a vector, only now we call it a list) and this is
why is.vector(mylist) returns TRUE.

6
So if you subset a list in the usual way (using single square brackets), you simply get a new list
(containing the subset of objects). If the objects in the list are named, then you can extract the list objects
using the $ operator.

Task 6: To get a list of all objects in the work space use ls()

Task 7: Use the function remove rm (), to remove all variables and functions in the
workspace

4. Importing data

R can import data from almost any source, including text files, excel spreadsheets, statistical
packages, and database management systems. We’ll illustrate these techniques using the
Manyara_CD dataset containing climate data.
We use functions read.table or read.csv, read.csv() is a specialized version of read.table() .

Task 8: Import data in R, use the files named Manyara_CDT.txt, Manyara_CD.csv and
Manyara_CD.xlsx. Create a dataframe(df) called Climate Data (CD), Dataframe one CD1
when reading from .csv and CD2 when reading from .txt file. Use the function print () and
show() to diplay the data. Use the function summary to see the data summary some
descriptive statistics, use the function dim() to get the dimension of the data frame. Use

7
the function head() to extract the header of the dataframe and use str() function to
exprole the structure of the data.

Using the read.csv() and read.table() function

8
9
Using readr package you can also import data in r. The library readr must be loaded.

# import data from a comma delimited file

CD1<read_csv("/Users/Doroth/Documents/MoW/Year2_2021_2022/Task2_Capacity_Buildin
g/oogleClassroom/Manyara_CD.csv")#provide a full path where your file is located

# import data from a tab delimited file

CD2 <- read_tsv("Manyara_CDT.txt")#provide a full path where your file is located as above

These function assume that the first line of data contains the variable names, values are
separated by commas or tabs respectively, and that missing data are represented by blanks.
For example, the first few lines of the comma delimited file looks like this

The readxl package can import data from Excel workbooks. Both xls and xlsx formats are
supported.

# import data from an Excel workbook

10
CD3 <- read_excel("ClimateData.xlsx", sheet=1) #provide a full path where your file is located
as above

Since workbooks can have more than one worksheet, you can specify the one you want with
the sheet option. The default is sheet=1 .

11
5. Cleaning data and subseting data

The processes of cleaning your data can be the most time-consuming part of any data analysis.
The most important steps are considered below. While there are many approaches, those using
the dplyr and tidyr packages are some of the quickest and easiest to learn.

5.1 Selecting variables

The dplyr::select() function allows you to limit your dataset to specified variables (columns).

Taske 9:

Task 9.1; keep the variables id, date, t_max, t_min

Tempdata <- dplyr::select(CD3, id, date, t_max, t_min)


Print(Tempdata)

Task 9.2; keep the variables id, date, Rain and display using print() or Show()
Raindata <- dplyr::select (CD3, id, date, Rain)

Task 9.3; keep the variables id, and all other variable
# betwee year and Rain
newTempData <- dplyr::select (CD3, id, year:Rain)

Task 9.4; keep all variables except year and yday


newCD <- dplyr::select (CD3, -year, -yday)
show(newCD)

12
13
14
3.2. Saving data in to a file

We can save data in to a file using functions write.table and write.csv indicted below:

Task 10: As an exercise save the created above newTempData, RainData and
newClimateData in different files, both as tab deliminated (.txt) and comma deliminated
(.csv).

15

You might also like