0% found this document useful (0 votes)
3 views

Introduction to R

R is an open-source programming language widely used for statistical analysis and graphics, featuring over 1,800 packages for various applications. It provides effective data handling, graphical facilities, and a well-developed language for programming tasks. R is particularly useful for data visualization, statistical computing, and data manipulation, making it a popular choice among data scientists and statisticians.

Uploaded by

manu
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction to R

R is an open-source programming language widely used for statistical analysis and graphics, featuring over 1,800 packages for various applications. It provides effective data handling, graphical facilities, and a well-developed language for programming tasks. R is particularly useful for data visualization, statistical computing, and data manipulation, making it a popular choice among data scientists and statisticians.

Uploaded by

manu
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

What is R and why do we use it?

Open source, most widely


used for statistical analysis
and graphics
Extensible via dynamically
loadable add-on packages
>1,800 packages on CRAN
> v = rnorm(256)
> A = as.matrix (v,16,16)
> summary(A)
> library (fields)
> image.plot (A)
>…
> dyn.load( “foo.so”)
> .C( “foobar” )
> dyn.unload( “foo.so” )
1
Why R?

• Statistics & Data


Mining
• Commercial

• Technical Statistical computing


computing
and graphics
• Matrix and vector
http://www.r-project.org
formulations • Developed by R. Gentleman & R.
• Data Visualization
Ihaka
and analysis • Expanded by community as open
platform source
• Image processing, • Statistically rich
vector computing
2
The Programmer’s Dilemma
What
programming
language to
use & why?

i gh-
Scripting H l
e
(R, MATLAB, IDL) Lev es
uag
ng
La
Object Oriented
(C++, Java)

Functional
languages l
(C, Fortran) v e
w-Le ges
Lo gua
Assembly Lan
3
Features of R

R is an integrated suite of software for data manipulation,


calculation, and graphical display

• Effective data handling


• Various operators for calculations on arrays/matrices
• Graphical facilities for data analysis
• Well-developed language including conditionals, loops,
recursive functions and I/O capabilities.
Basic usage: arithmetic in R

• You can use R as a calculator


• Typed expressions will be evaluated and printed out
• Main operations: +, -, *, /, ^
• Obeys order of operations
• Use parentheses to group expressions
• More complex operations appear as functions
• sqrt(2)
• sin(pi/4), cos(pi/4), tan(pi/4), asin(1), acos(1), atan(1)
• exp(1), log(2), log10(10)
Getting help
• help(function_name)
– help(prcomp)
• ?function_name
– ?prcomp
• help.search(“topic”)
– ??topic or ??“topic”
• Search CRAN
– http://www.r-project.org
• From R GUI: Help  Search help…
• CRAN Task Views (for individual packages)
– http://cran.cnr.berkeley.edu/web/views/

6
Variables and assignment

• Use variables to store values


• Three ways to assign variables
•a=6
• a <- 6
• 6 -> a
• Update variables by using the current value in an
assignment
•x=x+1
• Naming rules
• Can include letters, numbers, ., and _
• Names are case sensitive
• Must start with . or a letter
R Commands
• Commands can be expressions or assignments
• Separate by semicolon or new line
• Can split across multiple lines
• R will change prompt to + if command not finished
• Useful commands for variables
• ls(): List all stored variables
• rm(x): Delete one or more variables
• class(x): Describe what type of data a variable stores
• save(x,file=“filename”): Store variable(s) to a binary file
• load(“filename”): Load all variables from a binary file
• Save/load in current directory or My Documents by
default
Vectors and vector operations

To create a vector: To access vector elements:


# c() command to create # 2nd element of x
vector x
x[2]
x=c(12,32,54,33,21,65) # first five elements of x
# c() to add elements to
x[1:5]
vector x
# all but the 3rd element of x
seq() command to create
#x=c(x,55,32)
sequence of number x[-3]
years=seq(1990,2003) # values of x that are < 40
# to contain in steps of .5 x[x<40]
a=seq(3,5,.5) # values of y such that x is <
40
# can use : to step by 1
y[x<40]
years=1990:2003;
To perform operations:
# rep() command to create # mathematical operations on
data that follow a regular vectors
pattern y=c(3,2,4,3,7,6,1,1)
b=rep(1,5)
x+y; 2*y; x*y; x/y; y^2 9
c=rep(1:2,4)
Matrices & matrix operations

To create a matrix:
# matrix() command to create matrix A with rows and cols
A=matrix(c(54,49,49,41,26,43,49,50,58,71),nrow=5,ncol
=2))
B=matrix(1,nrow=4,ncol=4)
To access matrix elements: Statistical operations:
# matrix_name[row_no, col_no] rowSums(A)
A[2,1] # 2nd row, 1st colSums(A)
column element rowMeans(A)
A[3,] # 3rd row colMeans(A)
A[,2] # 2nd column of the # max of each columns
matrix apply(A,2,max)
A[2:4,c(3,1)] # submatrix of # min of each row
2nd-4th elements of the 3rd and 1st apply(A,1,min)
columns
Element
A["KC",]by element
# access ops:
row by name, Matrix/vector multiplication:
"KC"
2*A+3; A+B; A*B; A/B; A %*% B; 10
Useful functions for vectors and
matrices
• Find # of elements or dimensions
• length(v), length(A), dim(A)
• Transpose
• t(v), t(A)
• Matrix inverse
• solve(A)
• Sort vector values
• sort(v)
• Statistics
• min(), max(), mean(), median(), sum(), sd(), quantile()
• Treat matrices as a single vector (same with sort())
Graphical display and plotting

• Most common plotting function is plot()


• plot(x,y) plots y vs x
• plot(x) plots x vs 1:length(x)
• plot() has many options for labels, colors, symbol, size, etc.
• Check help with ?plot
• Use points(), lines(), or text() to add to an existing plot
• Use x11() to start a new output window
• Save plots with png(), jpeg(), tiff(), or bmp()
R Packages
• R functions and datasets are organized into packages
• Packages base and stats include many of the built-in
functions in R
• CRAN provides thousands of packages contributed by R
users
• Package contents are only available when loaded
• Load a package with library(pkgname)
• Packages must be installed before they can be loaded
• Use library() to see installed packages
• Use install.packages(pkgname) and
update.packages(pkgname) to install or update a package
• Can also run R CMD INSTALL pkgname.tar.gz from
command line if you have downloaded package source
Exploring the iris data
• Load iris data into your R session:
– data (iris);
– help (data);
• Check that iris was indeed loaded:
– ls ();
• Check the class that the iris object belongs to:
– class (iris);
• Read Sections 3.4 and 6.3 in “Introduction to
R”
• Print the content of iris data:
– iris;
• Check the dimensions of the iris data:
– dim (iris);
• Check the names of the columns:
– names (iris);
14
Exploring the iris data (cont.)
• Plot Petal.Length vs. Petal.Width:
– plot (iris[ , 3], iris[ , 4]);
– example(plot)
• Exercise: create a plot similar to this figure:

Src: Figure is from Introduction to Data


Mining by Pang-Ning Tan, Michael Steinbach, 15
Reading data from files

• Large data sets are better loaded through the file input
interface in R
• Reading a table of data can be done using the read.table()
command:
• a <- read.table(“a.txt”)
• The values are read into R as an object of type data frame (a
sort of matrix in which different columns can have different
types). Various options can specify reading or discarding of
headers and other metadata.
• A more primitive but universal file-reading function exists,
called scan()
• b = scan(“input.dat”);
• scan() returns a vector of the data read
Programming in R
• The following slides assume a basic
understanding of programming concepts

• For more information, please see chapters 9 and


10 of the R manual:
http://cran.r-project.org/doc/manuals/R-intro.html

Additional resources
• Beginning R: An Introduction to Statistical Programming
by Larry Pace
• Introduction to R webpage on APSnet:
http://www.apsnet.org/edcenter/advanced/topics/
ecologyandepidemiologyinr/introductiontor/Pages/default.aspx
• The R Inferno:
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
17
Conditional statements

• Perform different commands in different situations


• if (condition) command_if_true
• Can add else command_if_false to end
• Group multiple commands together with braces {}
• if (cond1) {cmd1; cmd2;} else if (cond2) {cmd3;
cmd4;}
• Conditions use relational operators
• ==, !=, <, >, <=, >=
• Do not confuse = (assignment) with == (equality)
• = is a command, == is a question
• Combine conditions with and (&&) and or (||)
• Use & and | for vectors of length > 1 (element-wise)
Loops
• Most common type of loop is the for loop
• for (x in v) { loop_commands; }
• v is a vector, commands repeat for each value in v
• Variable x becomes each value in v, in order
• Example: adding the numbers 1-10
• total = 0; for (x in 1:10) total = total + x;
• Other type of loop is the while loop
• while (condition) { loop_commands; }
• Condition is identical to if statement
• Commands are repeated until condition is false
• Might execute commands 0 times if already false
• while loops are useful when you don’t know number of
Scripting in R

• A script is a sequence of R commands that perform some


common task
• E.g., defining a specific function, performing some
analysis routine, etc.
• Save R commands in a plain text file
• Usually have extension of .R
• Run scripts with source() :
• source(“filename.R”)
• To save command output to a file, use sink():
• sink(“output.Rout”)
• sink() restores output to console
• Can be used with or outside of a script
Lists

• Objects containing an ordered collection of objects


• Components do not have to be of same type
• Use list() to create a list:
• a <- list(“hello”,c(4,2,1),“class”);
• Components can be named:
• a <- list(string1=“hello”,num=c(4,2,1),string2=“class”)
• Use [[position#]] or $name to access list elements
• E.g., a[[2]] and a$num are equivalent
• Running the length() command on a list gives the number of
higher-level objects
Writing your own functions

• Writing functions in R is defined by an assignment like:


• a <- function(arg1,arg2) { function_commands; }
• Functions are R objects of type “function”
• Functions can be written in C/FORTRAN and called via .C()
or .Fortran()
• Arguments may have default values
• Example: my.pow <- function(base, pow = 2) {return
base^pow;}
• Arguments with default values become optional, should
usually appear at end of argument list (though not required)
• Arguments are untyped
• Allows multipurpose functions that depend on argument
type
Useful R links
• R Home: http://www.r-project.org/
• R’s CRAN package distribution:
http://cran.cnr.berkeley.edu/
• Introduction to R manual:
http://cran.cnr.berkeley.edu/doc/manuals/R-intro.pdf
• Writing R extensions:
http://cran.cnr.berkeley.edu/doc/manuals/R-exts.pdf
• Other R documentation:
http://cran.cnr.berkeley.edu/manuals.html

23

You might also like