0% found this document useful (0 votes)
113 views

Speeding Up R Code With C++: A Minimal Introduction (Day 2)

This document provides an overview of using C++ to speed up R code. It discusses integrating C++ and R using Rcpp, including writing functions, defining variables, control structures like for loops and if/else statements. Performance benchmarks show Rcpp code running much faster than plain R while maintaining safety. The document concludes with best practices for packaging Rcpp code into reusable R packages.

Uploaded by

Mazin Digna
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Speeding Up R Code With C++: A Minimal Introduction (Day 2)

This document provides an overview of using C++ to speed up R code. It discusses integrating C++ and R using Rcpp, including writing functions, defining variables, control structures like for loops and if/else statements. Performance benchmarks show Rcpp code running much faster than plain R while maintaining safety. The document concludes with best practices for packaging Rcpp code into reusable R packages.

Uploaded by

Mazin Digna
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Speeding up R code with C++

A Minimal Introduction (Day 2)

Davor Cubranic

Computing Seminar
July 27, 2010
Overview

• More control structures


• Performance benefits
• Creating packages
• Easy operations on matrices and vectors
• Tighter integration with R
• Organizing your code
Review

• C++ history and philosophy


• Compiler vs. interpreter
• Writing extensions in C++ using Rcpp
• Functions
• Variables
• For-loop
Review: C++ and Rcpp

# include <Rcpp . h>


# include <iostream >

using Rcpp : NumericVector ;


using namespace s t d ;

RcppExport SEXP n u m e r i c i d (SEXP x r ) {


NumericVector x ( x r ) ;
int x length = x . size ( ) ;

c o u t << ” R e t u r n i n g a v e c t o r o f l e n g t h ” <<
x l e n g t h << e n d l ;

return x ;
}
Review: Writing C++ functions

• C++ functions follow the following format:

return type fn name ( arg type arg name , . . . ) {


body
r e t u r n value ;
}

• Functions callable from R return and receive SEXPs:

RcppExport SEXP fn name (SEXP arg name ) {


...
}
Review: Calling your C++ code

• You can’t call your C++ function directly like an R one


• Have to use .Call:

. C a l l ( ’ numeric i d ’ , x )

• Wrap with R:

num . i d <− function ( x ) {


. C a l l ( ’ numeric i d ’ , x )
}

• Now from elsewhere in R you can just say

num . i d ( rnorm ( 1 0 ) )
Review: Defining Variables

NumericVector x ;
int x length ;

• Type, then name


• Starting value depends on the type
• NumericVector: an empty vector
• int: undefined
Review: Initializing Variables

• We can specify the initial value of the variable

int x length = x . size ( ) ;

• Or we can tell C++ to construct it from something

NumericVector x ( x r ) ;
NumericVector x square ( x l e n g t h ) ;

• Can have different ways to construct a variable depending


on the constructor arguments
• E.g., initialize with contents of a given SEXP
• Or create vector of zeros of some specified length
Review: For-loop

f o r ( initialize ; condition ; increment )


body

• First execute once the initialize statement


• Then evaluate the condition
• If true:
• execute the body and then the increment
• go back to evaluating the condition
• If false, end the loop
Review: For-loop Example

for ( i n t i = 0; i < x lengt h ; i = i + 1) {


x square [ i ] = x [ i ] ∗ x [ i ] ;
}

• Sets each element of x square


• . . . to the square of the corresponding element in x

• Note: i takes on values 0..(xlength − 1)


• . . . and exists only inside the loop
While-loop

while ( condition ) {
body
}

• First evaluate the condition


• If true:
• execute the body
• go back to evaluating the condition
• If false, end the loop
While-loop vs. For-loop

• While-loop:
• Simpler syntax
• More flexible—you can do whatever you want
• . . . but you have to do everything yourself
• For-loop:
• More complicated syntax
• Simpler to use in specific uses
• . . . perfect for counting
Converting While-loop to For-loop

f o r ( initialize ; initialize
condition ; while ( condition ) {
increment ) { body
body increment
} }
Example: For-loop to While-loop

int i = 0;
while ( i < x l e n g t h ) {
x square [ i ] = x [ i ] ∗ x [ i ] ;
i = i + 1;
}

• Note: i now exists outside the loop


• . . . and at the end of the loop has value xlength
If-else

• Very similar to R

i f ( condition ) {
true body
}
else {
false body
}

• First evaluate the condition


• If true:
• execute the true body
• If false:
• execute the false body
If-else (cont’d)

• The else clause is optional

i f ( condition ) {
true body
}

• First evaluate the condition


• If true:
• execute the true body
• If false, do nothing
Example: If-else
/ / F i l e : abs . cpp
# include <Rcpp . h>

RcppExport SEXP abs vec (SEXP x r ) {


Rcpp : : NumericVector x ( x r ) ;
int x length = x . size ( ) ;
Rcpp : : NumericVector x abs ( x l e n g t h ) ;

f o r ( i n t i = 0 ; i < x l e n g t h ; ++ i ) {
i f ( x [ i ] >= 0 ) {
x abs [ i ] = x [ i ] ;
}
else {
x abs [ i ] = −x [ i ] ;
}
}
r e t u r n x abs ;
}
Warning!!
• In R, if-else is a expression
• It returns a value
• . . . the value of the last executed expression in its body
• In C++, if-else is a statement
• This means that it does not return a value
• In R, we can write: y <−if (x < 0) −1 else 1
• You can’t do that in C++, because there is nothing to
assign
• You have to do:

i f ( x < 0) {
y = −1;
}
else {
y = 1;
}
Summary: Control Structures

• Similar to those in R
• if-else
• for-loop
• while-loop
• With some important differences in syntax and behaviour
• Keep those in mind!!
Rcpp Performance

• Let’s compare the performance of R, plain C, and Rcpp


• Algorithm: Convolution of two finite sequences
• Repeat 1,000 times with vectors of size 100
Convolution in R

convolve . r <− function ( a , b ) {


ab <− numeric ( length ( a )+ length ( b) −1)
f o r ( i i n seq along ( a ) ) {
f o r ( j i n seq along ( b ) ) {
ab [ i + j −1] <− ab [ i + j −1] + a [ i ] ∗ b [ j ]
}
}
r e t u r n ( ab )
}
Convolution in C

# include <R. h>


# include <R i n t e r n a l s . h>

SEXP convolve2 (SEXP a , SEXP b )


{
R l e n t i , j , na , nb , nab ;
double ∗xa , ∗xb , ∗xab ;
SEXP ab ;

PROTECT( a = c oe r c eV e ct o r ( a , REALSXP ) ) ;
PROTECT( b = c oe r c eV e ct o r ( b , REALSXP ) ) ;
na = l e n g t h ( a ) ;
nb = l e n g t h ( b ) ;
nab = na + nb − 1 ;
PROTECT( ab = a l l o c V e c t o r (REALSXP, nab ) ) ;
xa = REAL( a ) ;
xb = REAL( b ) ;
xab = REAL( ab ) ;
f o r ( i = 0 ; i < nab ; i ++)
xab [ i ] = 0 . 0 ;
f o r ( i = 0 ; i < na ; i ++)
f o r ( j = 0 ; j < nb ; j ++)
xab [ i + j ] += xa [ i ] ∗ xb [ j ] ;
UNPROTECT( 3 ) ;
r e t u r n ( ab ) ;
}
Convolution in Rcpp
# include <Rcpp . h>
RcppExport SEXP convolve (SEXP a , SEXP b )
{
Rcpp : : NumericVector xa ( a ) ;
Rcpp : : NumericVector xb ( b ) ;

i n t n xa = xa . s i z e ( ) ;
i n t n xb = xb . s i z e ( ) ;
i n t nab = n xa + n xb − 1 ;

Rcpp : : NumericVector xab ( nab ) ;

f o r ( i n t i = 0 ; i < n xa ; i ++)
f o r ( i n t j = 0 ; j < n xb ; j ++)
xab [ i + j ] += xa [ i ] ∗ xb [ j ] ;

r e t u r n xab ;
}
Performance Results

Implementation Time Relative Lines


R 141.47 4878.28 10
C 0.029 1 26
Rcpp (bounds checking) 0.052 1.79 16
Rcpp (no bounds check) 0.030 1.03 19
C (calling via “.C”) 0.059 2.03 10 + 6
Summary: Performance

• Using any language involves trade-offs


• R: simple, safe, potentially slow
• C: fastest, complicated, unsafe
• Rcpp: pretty fast, pretty safe, pretty simple
• Choose carefully
• Getting faster → less safe, more complicated
• Using Rcpp may be good enough
Packaging Your Code

• Remember our workflow for Rcpp code


• Set up compiler parameters (PKG LIBS and
PKG CXXFLAGS)
• Compile and link (R CMD SHLIB file.cpp)
• Load the library (dyn.load(’file.so’))
• Call the code (.Call(’function’, arguments))
• The recommended way to do this is by creating a package
• Easy to compile, install and distribute
• C++ code and its R wrappers are loaded with one
command (library(’file’))
Creating an Rcpp Package

• Packages have rules for file structure


• Files DESCRIPTION and NAMESPACE
• Directory src for compiled code
• Directory R for R wrappers
• Directory man for help pages
• Rcpp provides a single command to create this structure
Rcpp:::Rcpp.package.skeleton(’package name’)
• Creates all package files under the directory
package name
• You need to do:
• Edit DESCRIPTION with your information
• Add your .cpp files to src
• Add your .R files to R
• Complete the documentation in man
Package: package_name
Type: Package
Title: What the package does (short line)
Version: 1.0
Date: 2010-07-22
Author: Who wrote it
Maintainer: Who to complain to <[email protected]>
Description: More about what it does
(maybe more than one line)
License: What license is it under?
LazyLoad: yes
Depends: Rcpp (>= 0.8.3)
LinkingTo: Rcpp
SystemRequirements: GNU make
Compiling and Loading Packages

• To compile:
R CMD check package name
• Then install it:
R CMD INSTALL package name
• Now you can use it like any other R package:
library(package name)
Summary: Packages

• Packaging simplifies:
• compiling,
• loading, and
• distributing your code
• Use it!
Easy Matrix Math with RcppArmadillo

• Armadillo is a C++ wrapper around standard libraries for


matrix arithmetic and linear algebra
• Provides types and convenient operations
• RcppArmadillo packages Armadillo for use in R
• Takes care of compilation and linking settings
• Integrates R and Armadillo types
Packaging with RcppArmadillo

• Using RcppArmadillo with SHLIB is pretty complicated


• Now our package uses both Rcpp and RcppArmadillo
• Plus a few more libraries to link with
• Using a package now really pays off
• RcppArmadillo provides a similar command to create
package structure
RcppArmadillo:::RcppArmadillo.package.skeleton(
package name’)
• Fill-in the blanks as before
• Edit DESCRIPTION
• Put code into src and R
• Documentation in man
Example: abs() with RcppArmadillo

/ / F i l e : abs arma . cpp


# include <RcppArmadillo . h>

RcppExport SEXP abs vec (SEXP x ) {


Rcpp : : NumericVector x r ( x ) ;
arma : : vec x ( x r . begin ( ) , x r . s i z e ( ) , f a l s e ) ;

r e t u r n Rcpp : : wrap ( abs ( x ) ) ;


}
Armadillo Vector Type

Rcpp : : NumericVector x r ( x ) ;
arma : : vec x ( x r . begin ( ) , x r . s i z e ( ) , f a l s e ) ;

• vec is Armadillo type for vectors


• Create them by wrapping Rcpp’s NumericVectors
• Access elements with index in parentheses:
x(3)
• Indexing is 0-based!!
Example: A0 A with Armadillo

/ / F i l e : a t a . cpp
# include <RcppArmadillo . h>

using namespace Rcpp ;


using namespace arma ;

RcppExport SEXP a t a (SEXP A ) {


NumericMatrix A r ( A ) ;
mat A( A r . begin ( ) , A r . nrow ( ) , A r . n c o l ( ) , f a l s e )

r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;
}
Armadillo Matrix Type

NumericMatrix A r ( A ) ;
mat A( A r . begin ( ) , A r . nrow ( ) ,
A r . ncol ( ) , false ) ;

• mat is Armadillo type for matrices


• Create them by wrapping Rcpp’s NumericMatrixs
• Access elements with index in parentheses:
x(row, column)
• Indexing is 0-based!!
Armadillo Matrix Operators

r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;

+, - Element-wise addition/subtraction
%, / Element-wise multiplication/division
* Matrix multiplication
==, != Element-wise (in)equality comparison
<, <= Element-wise less-than (or equal) comparison
>, >= Element-wise greater-than (or equal) comparison
Armadillo Matrix Functions

r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;

trans matrix transpose


accu sum all elements
log, exp element-wise logarithm and exponential
sqrt element-wise square root
diagvec extract a diagonal
Example: Fast Linear Regression
/ / F i l e : f a s t l m . cpp
RcppExport SEXP f a s t l m (SEXP ys , SEXP Xs ) {
NumericVector y r ( ys ) ;
NumericMatrix Xr ( Xs ) ;
i n t n = Xr . nrow ( ) , k = Xr . n c o l ( ) ;

mat X ( Xr . begin ( ) , n , k , f a l s e ) ;
vec y ( y r . begin ( ) , y r . s i z e ( ) , f a l s e ) ;

vec c o e f = s o l v e ( X , y ) ; // f i t model y ˜ X
vec r e s i d u a l s = y − X∗ c o e f ;

double s i g 2 = a s s c a l a r ( t r a n s ( r e s i d u a l s ) ∗ r e s i d u a l s / ( n−k ) ) ;

/ / std . e r r o r of estimate
vec s t d e r r e s t = s q r t ( s i g 2 ∗ diagvec ( i n v ( t r a n s (X) ∗X ) ) ) ;

return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
) ;
}
Armadillo Matrix Decomposition

vec c o e f = s o l v e ( X , y ) ;

inv Matrix inverse


solve Solve AX = B for X
chol Cholesky decomposition
eig sym Eigen-decomposition
Summary: RcppArmadillo

• Lots of operations on matrices


• Looks almost like R (but faster)
• But more work to manage types (Rcpp ↔ Armadillo)
Rcpp and R: Beyond SEXPs

• Rcpp allows deep integration with R


• Almost any language element is available as a C++ type
• Lists
• Environments
• Functions
• And each type comes with easy-to-use operations on it
Running Fast Linear Regression

> dyn.load(’fast_lm.so’)
> data(trees)
> .Call(’fast_lm’, log(trees$Volume),
cbind(1, log(trees$Girth)) )
$coefficients
[,1]
[1,] -2.353325
[2,] 2.199970

$stderr
[,1]
[1,] 0.23066284
[2,] 0.08983455
Creating R Lists

return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
);

• Creates an R list
• Puts Armadillo vectors and matrices into named elements
• Returns the list to R
Calling R Functions

F u n c t i o n r n o r m f n ( ” rnorm ” ) ;
NumericVector o u t r = r n o r m f n ( 1 0 ,
[ ” mean ” ] = 3 ,
[ ” sd ” ] = 2 ) ;
vec o u t = vec ( o u t r . begin ( ) , o u t r . s i z e ( ) ) ;

• Creates an R function
• Calls it with a combination of position and named
arguments
• Handy, but also has its disadvantages:
• This will slow down your code!!
• The returned value is an Rcpp type (e.g., NumericVector)
• So you’ll you’ll need to get back to Armadillo types yourself
• Use it only if you really have to
Organizing Your Code

• Long programs are difficult to write and debug


• Break algorithms into smaller steps
• E.g., for linear regression:
1 Find coefficients for y X
2 Calculate residuals

• Each step can be tested independently


• Usually, each step → function
Example: Fast Linear Regression

# include <RcppArmadillo . h>

using namespace Rcpp ;


using namespace arma ;

RcppExport SEXP f a s t l m (SEXP ys , SEXP Xs ) {


NumericVector y r ( ys ) ;
NumericMatrix Xr ( Xs ) ;

mat X ( Xr . begin ( ) , Xr . nrow ( ) , Xr . n c o l ( ) , f a l s e ) ;


vec y ( y r . begin ( ) , y r . s i z e ( ) , f a l s e ) ;

vec c o e f = f i t ( X , y ) ;
vec s t d e r r = s t d e r r e s t ( y , X , c o e f ) ;

return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
);
}
[...]
Example: Fast Linear Regression (cont’d)
[...]
/ / f i t model y ˜ X
vec f i t ( const mat& X , const vec& y ) {
return solve (X, y ) ;
}

/ / std . e r r o r of estimate
vec s t d e r r e s t ( const vec& y , const mat& X ,
const vec& c o e f ) {
vec r e s i d u a l s = y − X∗ c o e f ;

i n t n = X . n rows , k = X . n c o l s ;
double s i g 2 = a s s c a l a r ( t r a n s ( r e s i d u a l s ) ∗
r e s i d u a l s / ( n−k ) ) ;

r e t u r n s q r t ( s i g 2 ∗ diagvec ( i n v ( t r a n s ( X) ∗X ) ) ) ;
}
Passing Parameters

vec f i t ( const mat& X , const vec& y ) {

• What does const mat& X mean?


• What is the type of X?
• const means it’s a constant
• Cannot be modified inside the fit function
• mat means it’s a matrix
• And the ampersand (&) tells the compiler that it’s a
reference
• Refers directly to the original argument in the caller
• Saves us from creating a copy just for the call
• Use this pattern in your functions!!
Declaring Functions

• There is a problem with this code!!


• In C++, everything has to be declared before it is used
• But we’re using functions fit and stderrest before they
are defined
• We can either move them to the top of the file
• Less readable, prefer having the top-level function first
• Or we can just declare their existence at the top
• Specifies their name, return type, and parameters
• . . . with no body at that point
• . . . and have the actual definition later
# include <RcppArmadillo . h>

using namespace Rcpp ;


using namespace arma ;

vec f i t ( const mat& X , const vec& y ) ;


vec s t d e r r e s t ( const vec& y , const mat& X ,
const vec& c o e f ) ;

RcppExport SEXP f a s t l m (SEXP ys , SEXP Xs ) {


[...]
}

vec f i t ( const mat& X , const vec& y ) {


return solve (X, y ) ;
}

/ / std . e r r o r of estimate
vec s t d e r r e s t ( const vec& y , const mat& X ,
const vec& c o e f ) {
vec r e s i d u a l s = y − X∗ c o e f ;
[...]
}
Multi-file Development

• Break long programs into multiple files


• Each file contains logically related functions
• All files are compiled together into a single library
• Automatic if using packages
Multi-file Development (cont’d)

• What about function declarations?


• Still need to have them in each file for those functions that
are used in it
• Maintaining declarations gets tedious and error-prone
• Each time you change a function’s name or parameters,
have to change its declaration everywhere
• Solution: make a header file for your code
• Keep all declarations and includes in one place
• Then your other files just include the package header
• Header files use extension “.h”
Example: Header for “fast lm”
# i f n d e f FAST LM H
# define FAST LM H
/ / File : fast lm .h

# include <RcppArmadillo . h>

RcppExport SEXP f a s t l m (SEXP ys , SEXP Xs ) ;

arma : : vec f i t ( const arma : : mat& X ,


const arma : : vec& y ) ;

arma : : vec s t d e r r e s t ( const arma : : vec& y ,


const arma : : mat& X ,
const arma : : vec& c o e f ) ;

#endif
Header Files Rules

• Always start with #ifdef-#define pair and end with


#endif
# i f n d e f FAST LM H
# define FAST LM H
[...]
#endif

• Replace FAST LM H with a unique name for your package


• Convention: all-uppercase, separate words with
underscores, end with “ H”
• Add the necessary #includes
• . . . and function declarations
• Do not put any using declarations here
• Keep them in the implementation files
Example: Implementation for “fast lm”
/ / F i l e : f a s t l m . cpp
# include ” f a s t l m . h ”

using namespace Rcpp ;


using namespace arma ;

SEXP f a s t l m (SEXP ys , SEXP Xs ) {


[...]
}

vec f i t ( const mat& X , const vec& y ) {


[...]
}

vec s t d e r r e s t ( const vec& y , const mat& X ,


const vec& c o e f ) {
[...]
}
Implementation Files Rules

• Include your package’s header file at the top


• Enclose its name with quotes

# include ” f a s t l m . h ”

• Angle brackets are for system headers


• Then #include any other headers needed by code in this
implementation file
• Finally, write the function definitions for this file
Summary: Organizing Your Code

• Long programs are difficult to write and debug


• Divide your code into smaller functions
• If any file gets too long, divide it into multiple files
• Still keep logically-related functions in a single file
• Create a header file with all function declarations
• This header is included by each implementation file
Summary

• Sometimes R is just not fast enough


• R extensions let you use compiled code → fast
• Traditionally, extensions are written in C or Fortran
• Now you can also write them in C++
• (almost) as fast as C
• safer (type and bounds checking)
• simpler integration with R
• easier transition
• Each option has its trade-offs
• It’s your choice to make

You might also like