01-MSBA-615 - Introduction To R Programming and R Studio
01-MSBA-615 - Introduction To R Programming and R Studio
Packages
2
Installing R
• Visit https://cloud.r-project.org/
• Download the appropriate version for your system
• Click the downloaded file and follow prompts to install
• Choosing the default options should be fine
• A new version of R is released every year, and there are 2-3 minor
releases each year. You should update R regularly.
3
Installing RStudio
• RStudio is an integrated development environment, or IDE, for R
programming.
• Visit: http://www.rstudio.com/download
• Download the appropriate version and install
5
Doe-Anderson
• https://www.doeanderson.com/
Not really…
6
About You
• Name
• Where do you work? What do you do there?
7
About the Course
• Syllabus available on blackboard
8
Student Input
• Take 5 minutes to write down 3 things you are hoping to learn more about in this course
9
History of R
• R is an open source programming language/software environment
• An implementation of the S programming language
• S was created by John Chambers at Bell Labs as an internal environment for
statistical analysis
• S was designed for users to begin interacting with the language in a way that
doesn’t “feel” like programming
• As users’ needs become more sophisticated they will be able to gradually
transition into programming
10
History of R
• R was created in New Zealand by Ross Ihaka and Robert Gentleman
• Both author’s names begin with R, but R is also a play on S
• R version 1.0.0 was released in 2000
• Runs on almost any platform
11
History of R
http://www.nytimes.com/2009/01/07/technology/business-
computing/07program.html
12
History of R
http://www.forbes.com/sites/gilpress/2015/10/21/the-number-of-data-scientists-has-doubled-over-the-last-4-
years/2/#3310e65e1f6a 13
History of R
http://blog.revolutionanalytics.com/2015/12/r-is-the-fastest-growing-language-on-stackoverflow.html
14
Why is R becoming so popular?
• Most commercial stats software costs thousands of dollars
• Just about any statistical technique can be implemented in R
• Users can contribute “packages” -> all of the most advanced/new methods are available
before they reach other platforms
• State-of-the-art graphics capabilities
• R is highly adaptive: works with most data/file types
15
R Resources
• https://www.r-project.org
• News
• Download latest version of R
• CRAN: Comprehensive R Archive Network
• R has many useful functions built into the software (referred to as base R)
• CRAN stores user-contributed “packages” that add new functionality
• Currently, the CRAN package repository features 10k+ available packages
for free download
• Packages can be installed directly from the R Console (R will connect to
CRAN and download)
• https://gallery.shinyapps.io/087-crandash/
16
R Community
• Very active community of users
• https://www.r-bloggers.com/
• Twitter: #rstats, @hadleywickham, etc.
• StackOverflow: 313,323 questions tagged “R”
• useR! International conference for users
17
RStudio
• https://www.rstudio.com
• RStudio is an integrated development environment (IDE) for R
• Console
• Syntax-highlighting editor with direct code execution
• Tools for plotting, history, debugging, workspace management, version control (git
integration), publishing, etc.
• Open source (freely available)
18
RStudio Overview
19
Getting Started with RStudio
• Settings
• Key features
• RStudio “Projects”
20
Getting Started with R
• R is a case-sensitive, interpreted language
• Enter commands one at a time from the command prompt or run entire scripts
21
RStudio Script Editor
• Create a new script: ctrl + Shift + n
• Script editor:
• Syntax errors will be highlighted with a red “x” and squiggly red underlining
22
Input
• Things we enter into the console are called expressions
23
Important Operators
• Arithmetic: +, -, *, /
• Exponents: ^
24
Relational Operators
• Comparing for equality: ==
• Not equal: !=
• Not: !
25
Mathematical Functions
• Almost any mathematical function you can think of is built into R:
26
Objects in R
• In R, almost EVERYTHING is an “object”
• Numbers
• Variables
• Functions
27
Basic Assignment in R
Assigning (storing) the value 8 in an object called x
x <- 8
Variable Assigner Value
(object)
28
Objects in R
• R has five atomic classes of objects*:
• Character
• Numeric
• Integer
• Complex
• Logical
29
Vectors in R
• Vectors are the most basic kind of object
Creating a vector:
x <- 1:8
Variable Assigner Vector
(object)
31
Creating Vectors in R
Assigning (storing) the vector of integers from 1 to 3 in an object called x
x <- c(1,2,3)
Variable Assigner Vector
(object)
32
“Vectorization”
• Vectorization is a term in R that can mean different things in different contexts
• “Vectorized” can mean that an operator or function will act on each element
of the object without an explicit loop
• 10 * c(1:5) # here 1:5 is vectorized
• median(1:5)
33
Vectors in R
• The third meaning is vectorization over arguments:
34
Quick Check
• Answer the following:
• Is the sum of all integers between 1 and 500 greater than 50 cubed?
35
Quick Check Solution
• Answer the following:
• Is the sum of all integers between 1 and 500 greater than 50 cubed?
1. Create an object called x that holds integers between 1 and 500
x <- 1:500
y <- 50^3
3. Create test logic using relational operators (>) to find the answer
x>y
36
Quick Check Solution
• Answer the following:
• Is the sum of all integers between 1 and 500 greater than 50 cubed?
1. Create an object called x that holds the sum of integers between 1 and 500
x <- sum(1:500)
y <- 50^3
3. Create test logic using relational operators (>) to find the answer
x>y
37
Numbers (Numeric Class)
• Numbers are objects of the numeric class
• Special Numbers:
• Inf: represents infinity (-Inf, also)
38
Numbers (Numeric Class)
• There are functions available to check for special numbers:
• is.nan()
• is.na()
39
Logical Values (Logic Class)
• TRUE and FALSE are special words in R
• You cannot assign values to them (lower- and mixed-case object names will
work)
• T and F are preassigned short-hand expressions for TRUE and FALSE (they
can be redefined, though, so be careful)
40
Logical Values (Logic Class)
• Other Logical Operators
• Not: !
• And: &
• Or: |
41
Objects in R: Attributes
• Objects can have attributes
• Names: names()
• Dimensions: dim()
• Class: class()
• Length: length()
42
Combining Vectors into new Objects
• Base R contains functions that can smash two vectors into an object with a
tabular data structure (think table of data)
• cbind() – aka COLUMN bind – binds two or more vectors into columns of a table
43
rbind
x <- 1:10
y <- 11:20
rbind(x, y)
44
cbind
x <- 1:10
y <- 11:20
cbind(x, y)
45
Quick Check
• Answer the following:
• Quick Check 2: Create an object called z that stores 2 columns of data.
• The first column is comprised of numbers 100, 200, 300, 400, 500.
• The second column stores the square of each value in the first column divided by half
of the value in the first column
46
Quick Check Solution
• Answer the following:
• Quick Check 2: Create an object called z that stores 2 columns of data.
• The first column is comprised of numbers 100, 200, 300, 400, 500.
• The second column stores the square of each value in the first column divided by half
of the value in the first column