0% found this document useful (0 votes)
68 views

Big Data 2 Introduction To R: Based On Prabhpreet Sidhu's Slides

This document provides an introduction to R, including: - An overview of the R installation process and using the RStudio interface with its console, workspace, history, files, plots, and packages tabs. - Descriptions of different data types in R like vectors, matrices, data frames, and lists. - Explanations of key R functions and programming concepts such as setting the working directory, creating R scripts to record code, and installing additional packages. - Guidance on plotting graphs in R and exporting them for use in documents or presentations.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Big Data 2 Introduction To R: Based On Prabhpreet Sidhu's Slides

This document provides an introduction to R, including: - An overview of the R installation process and using the RStudio interface with its console, workspace, history, files, plots, and packages tabs. - Descriptions of different data types in R like vectors, matrices, data frames, and lists. - Explanations of key R functions and programming concepts such as setting the working directory, creating R scripts to record code, and installing additional packages. - Guidance on plotting graphs in R and exporting them for use in documents or presentations.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Big Data 2

Introduction to R

Based on Prabhpreet Sidhu’s slides


Agenda
Introductions
Big Data Landscape
Intro to Data Modelling
Intro to R
R Installation
General Housekeeping
• Laptop - 12 GB Ram - 500 GB hard drive
• Late assignments will lose 10% of mark per day
• Do not submit copies of someone’s assignment as yours – 0% for both
and reporting to deanship
• Ask questions all the time in class and via email
Big Data is a Discipline
Big Data History
Big Data Era 2
Streaming
Big Data Landscape Summary
Structured vs Unstructured Data
Structured vs Unstructured Data
NoSql Guide
Streaming
Data Models
Data Models
Data Normalization
Skill Check
Skill Check
Data Modelling
R and Data
Variables & Objects
Single Entry
Vector
Matrices & Dataframes
List
Skill Check
Skill Check
R Advantages R Disadvantages
▪ Fast and free. ▪ Not user friendly @ start - steep learning
▪ State of the art: Statistical researchers curve, minimal GUI.
provide their methods as R packages. ▪ No commercial support; figuring out
SPSS and SAS are years behind R! correct methods or how to use a function
▪ 2nd only to MATLAB for graphics. on your own can be frustrating.
▪ Mx, WinBugs, and other programs use ▪ Easy to make mistakes and not know.
or will use R. ▪ Working with large datasets is limited by
▪ Active user community RAM
▪ Excellent for simulation, programming,▪ Data prep & cleaning can be messier &
computer intensive analyses, etc. more mistake prone in R vs. SPSS or SAS
▪ Forces you to think about your analysis.
▪ Interfaces with database storage
software (SQL)
Installation of R and R Studio
• Lets Install R
https://cloud.r-project.org/

• Lets install R studio


www.rstudio.com
RStudio
screen

The workspace tab shows all the active


objects (see next slide). The history
tab shows a list of commands used so
far.

The files tab shows all the files and folders


in your default workspace as if you were on
a PC/Mac window. The plots tab will show
all your graphs. The packages tab will list a
series of packages or add-ons needed to
The console is where you can type run certain processes. For additional info
commands and see output see the help tab

32
DSS/OTR
Workspace tab (1)
The workspace tab stores any object, value, function or anything you create during your R
session. In the example below, if you click on the dotted squares you can see the data on a
screen to the left.

Showing here matrix B. To see matrix


A click on the respective tab.
33
DSS/OTR
Workspace tab (2)
Here is another example on how the workspace looks like when more
objects are added. Notice that the data frame house.pets is
formed from different individual values or vectors.

Click on the dotted square to look at the


dataset in a spreadsheet form.

DSS/OTR
History
tab
The history tab keeps a record of all previous commands. It helps when testing and running
processes. Here you can either save the whole list or you can select the commands you want
and send them to an R script to keep track of your work.
In this example, we select all and click on the “To Source” icon, a window on the left will
open with the list of commands. Make sure to save the ‘untitled1’ file as an *.R script.

35
DSS/OTR
Changing the working
directory
1

If you have different projects you can change the working


directory for that session, see above. Or you can type:

# Shows the working directory (wd)

getwd()

# Changes the wd

setwd("C:/myfolder/data")

3
36
DSS/OTR
Setting a default working
directory
1

2
3

Every time you open RStudio, it goes to a


default directory. You can change the
default to a folder where you have your
datafiles so you do not have to do it every
time. In the menu go to Tools->Options

37
DSS/OTR
R script (1)
The usual Rstudio screen has four windows:
1. Console.
2. Workspace and history.
3. Files, plots, packages and help.
4. The R script(s) and data view.
The R script is where you keep a record of your work. For Stata users this would
be like the do-file, for SPSS users is like the syntax and for SAS users the SAS
program

38
DSS/OTR
R script (2)
To create a new R script you can either go to File -> New -> R Script,
or click on the icon with the “+” sign and select “R Script”, or simply press
Ctrl+Shift+N. Make sure to save the script

Here you can type R commands and run them. Just


leave the cursor anywhere on the line where the
command is and press Ctrl-R or click on the ‘Run’
icon above. Output will appear in the console below.

39
DSS/OTR
Packages tab
The package tab shows the list of add-ons included in the installation of RStudio. If
checked, the package is loaded into R, if not, any command related to that package
won’t work, you will need select it. You can also install other add-ons by clicking on
the ‘Install Packages’ icon. Another way to activate a package is by typing, for
example, library(foreign. This will automatically check the --foreign
package (it helps bring data from proprietary formats like Stata, SAS or SPSS).

DSS/OTR
Installing a package
Before

1
• We are going to install the package –
rgl (useful to plot 3D images). It
does not come with the original R
install
• Click on “Install Packages” 2
• write the name in the pop-up window
and click on “Install” 3

After
41
DSS/OTR
Plots tab (1)

The plots tab will display the graphs.


The one shown here is created by
the command on line 7 in the script
above.
See next slide to see what happens
when you have more than one graph

42
DSS/OTR
Plots tab (2)

Here there is a second graph (see


line 11 above). If you want to see the
first one, click on the left-arrow icon.

43
DSS/OTR
Plots tab (3) – Graphs export
To extract the graph, click on “Export” where you can save the file as an image (PNG, JPG,
etc.) or as PDF, these options are useful when you only want to share the graph or use it in a
LaTeX document. Probably, the easiest way to export a graph is by copying it to the clipboard
and then paste it directly into your Word document.

3
Make sure to select ‘Metafile’

4
15
5
DSS/OTR Paste it into your Word document
3D graphs

3D graphs will display on a separate


screen (see line 15 above). You won’t
be able to save it, but after moving it
around, once you find the angle you
want, you can screenshot it and paste
it to you Word document.
16
DSS/OTR

You might also like