0% found this document useful (0 votes)
77 views

Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats

This document discusses R, an open-source programming language for statistical analysis and graphics. It provides an overview of R's history and evolution from S, describes R's interface and principles as an object-oriented programming language, and discusses advantages such as being free, interfacing with other languages, and extensive visualization capabilities. Some drawbacks mentioned are a limited graphical user interface. Overall, the document promotes R as a powerful yet accessible tool for statistics and data analysis used widely in academia and research.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats

This document discusses R, an open-source programming language for statistical analysis and graphics. It provides an overview of R's history and evolution from S, describes R's interface and principles as an object-oriented programming language, and discusses advantages such as being free, interfacing with other languages, and extensive visualization capabilities. Some drawbacks mentioned are a limited graphical user interface. Overall, the document promotes R as a powerful yet accessible tool for statistics and data analysis used widely in academia and research.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

INTRODUCTION TO R

Shanti.S.Chauhan,Ph.D
Business Studies
SHUATS
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
HISTORY AND EVOLUTION OF R
Origin in the Bell Labs in the 1970’s
HISTORY AND EVOLUTION OF R
R has developed from the S language

S Version 1

S Version 2

S Version 3

S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R
1990’s: R developed concurrently
with S
1993: R made public

Acceleration of R development
 R-Help and R-Devl mailing-lists
 Creation of the R Core Group

Source: R Journal Vol 1/2


HISTORY AND EVOLUTION OF R
Growing number of packages

2001: ~100 packages

2009: Over 2000 packages

2000: R version 1.0.1


Today: R version 2.14

Source: R Journal Vol 1/2


HISTORY AND EVOLUTION OF R
Explosion of R popularity in the last decade

 Object-oriented, growing user base, scripting features

 Free and open-source

 Irrational reasons: R seen as « cool »


HISTORY AND EVOLUTION OF R
Comparison of Mailing Lists

Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
HISTORY AND EVOLUTION OF R
Popularity amongst programming languages

KD Nuggets 2012 survey


HISTORY AND EVOLUTION OF R
Number of Blogs

Software Number of Blogs


R 365
SAS 40
Stata 8
Others 0-3

Data as on Mar 2012


AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
PRINCIPLE AND SOFTWARE PARADIGM
R is not really a (statistical) software

 R is rather a programming language


 Limited user-friendly interfaces for data analysis
 Is object oriented and almost non declarative
 Similar to programming languages like Fortran, C, Java, Python
PRINCIPLE AND SOFTWARE PARADIGM
R has limited Graphical User Interface (GUI) options
Recent endeavours to enhance R user-friendliness
Several GUIs in development
R-commander
RKWard
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
R Commander (RCmdr)
PRINCIPLE AND SOFTWARE PARADIGM
RKWard
PRINCIPLE AND SOFTWARE PARADIGM
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets

VS.
PRINCIPLE AND SOFTWARE PARADIGM
Sophisticated but costly SAS

VS.

Screenshot of SAS enteprise Miner


7.1. Source: sas.com
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DESCRIPTION OF R INTERFACE
R console

RGui: R basic
interface

R desktop
shortcut R command
line (space to
write
instructions)
DESCRIPTION OF R INTERFACE
Using the command line in R console
First false sentence
followed by R’s
error message

Second correct
sentence

Declaration and
printing of the
sentence as a R
object

Simple math
computations

Basic information
about the R object
containing the
sentence
DESCRIPTION OF R INTERFACE
RGui menu: File tab

File tab: Usual basic


and general
operations
DESCRIPTION OF R INTERFACE
RGui menu: Edit tab
Data editor:
entering the
Edit tab: basic object’s name
and general
editing

Results of the
data editor
DESCRIPTION OF R INTERFACE
RGui menu: View tab

View tab: viewing


Toolbar and/or
Status bar
DESCRIPTION OF R INTERFACE
RGui menu: Misc tab

Misc tab:
diverse
operations
DESCRIPTION OF R INTERFACE
RGui menu: Packages tabs

Packages tab:
adding functions
to R foundation
DESCRIPTION OF R INTERFACE
RGui menu: Windows tab

Windows tab:
usual options
to arrange the
tiles
DESCRIPTION OF R INTERFACE
RGui menu: Help tab
Help tab: very
important links
to help
Arithmetic Operators in R
Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponent
%% Modulus(Remainder for
division)
%/% Integer Division
Relational Operators
Operator Description

< Less than

> Greater Than

<= Less or equal

>= Greater than or equal

== Equal to

!= Not equal
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
ADVANTAGES OF R
R “philosophy”
 Open source code
 You can access the code of the software
 In-depth understanding of what R does
 Modify the code

Example “mgcv”
package webpage
Adress of the
« mgcv » package

Link with Package


sources (.tar.gz
file)

Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN


ADVANTAGES OF R
R access to source code
Example of source code of the “mgcv” package
Unzipping List of directories List of functions (i.e
mgcv_1.7-13.tar.gz in the « mgcv » open code) in the « src »
file (with 7zip) package (i.e code sources)
directory the « mgcv »
1 2 3 package

Screenshot of unzipping the « mgcv » package and browsing through the package’s files.
ADVANTAGES OF R
R is free

Software Academics Demo Commercial Commercial


(basic) (full)

R Free Free Free Free

SAS Free to $100s Not available $1 000s $10 000s

Statistica $100s 30 days limit ~$1 000 $10 000

Excel Free to $10s Limited ~$100 $100s


(Microsoft)
SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
ADVANTAGES OF R
Interface with other languages and scripting capabilities
Interfaces with virtually any other programming language
 Fortran, C, C++, Python…
 Tailor or rewrite your old codes in R
R as a scripting language
 R scripts can launch or be launched by other languages

« mgcv.c » file
in the
« mgcv »
package
coded in
typical C
programming
language

Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad


ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R role in academia
 R ~ tool used by the finest researchers
 Top-notch analytics capabilities

Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk


ADVANTAGES OF R
To summarize

Free open source philosophy


 R websites with many examples
 Free books
 Free online open courses
 Twitter accounts

Online help and discussion


 Mailing-lists
 Very active and diverse forums
 Communities of developers and helpers
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DRAWBACKS OF R
Average memory performance
Poor management of large datasets
 Avoid imbricated loops
 Prefer R advanced language for data structure

Complicated structure of packages in R


 Dozen of packages
 To be loaded every time in memory

R packages to better manage memory


 Rhadoop (inspiration from Google)
 Ff
 bigmemory
DRAWBACKS OF R
Average computing performance
No default parallel execution

 R packages to use several cores

 Top skills needed for high performance computing

A high-level programming language


 Abstract and modern (Python…)
 More productive coding
 But further from « machine language »…
 … meaning 100 times slower than C
DRAWBACKS OF R
Difficult data visualization and management
Difficult to inspect data sets

Screenshot of the R data editor and « Viewtable » tab in SAS 9.3


DRAWBACKS OF R
Difficult architecture management
Problems for large organizations
 R made of several thousands independent packages
 No deployment plan for complex organizations
 No installation support

Lack of code accountability


 Thousands of individual independent R developers
 Nobody responsible for the quality of the code

Potentially high hidden costs with R

 Total cost may favour commercial solutions for complex computations made in large

corporations
DRAWBACKS OF R
Relatively difficult to learn
Steep learning curve
 R code far from undergrad computer science courses
 Very complex data structures (useful if mastered)
 Is R’s syntax not logical?

Still, not more difficult to learn than SAS


 Both SAS and R more abstract than basic programming languages (Fortran, C…)
 Difficult to learn = more rewarding professionally!!
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
SO WHY LEARN R?
More positive than negative points
No language is perfect!!
 Contradictory objectives to meet
 Strengths and weaknesses of each language

Effect of legacy and the culture of the organization


 Use existing solutions (system architecture, BA tools…)
 Habits in business analytics

Different needs imply different tools


 Large corporations + defined procedures  SAS-like
 Less financial resources + quick proof of concept  R
SO WHY LEARN R?
Very appealing solution

Overall Corporate Consultants Academics NGO/Gov't


R
SAS
IBMSPSS
STATISTICA
Owncode
Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
REFERENCES FOR LEARNING R
Books
Many books available: choose the one that fits you!
 Style, pedagogy, theory vs practice
 Browse several books at local library or store

Springer’s UseR! Series (http://www.springer.com/series/6991)


 Recent, concise, good quality, affordable, diverse

Pure rookies: « A beginners’ guide to R », « R by example»

One step forward: « Business analytics for managers »


Intensive Excel users: « R through Excel»

O’Reilly R series (for programmers)


« R cookbook », « R in a nuttshell »
REFERENCES FOR LEARNING R
Websites
R official websites
 The R project for statistical computing (www.r-project.org )
 Mailing lists (« R-help », Special Interest Groups) and R journal
 Official (austere) manuals (« An introduction to R »)

Other websites
 UCLA online R resources http://www.ats.ucla.edu/stat/r/)
 R blogs aggregator (www.r-bloggers.com)
 Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts
(@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
REFERENCES FOR LEARNING R
Conferences
Growing number of conferences about R
Official International R UseR! conference

 Annual during a few days in new venue (Google it!)


 Lots of materials about many topics

Other conferences or venues


 Conferences about business analytics (data mining, specialized topics…) with sessions
involving R
 Find (or even start!) a R user group close to your location (R Wiki geographical list, map of
groups on « meetup.com »)
 Events and news from R-bloggers blog

You might also like