Speeding Up R Code With C++: A Minimal Introduction (Day 2)
Speeding Up R Code With C++: A Minimal Introduction (Day 2)
Davor Cubranic
Computing Seminar
July 27, 2010
Overview
c o u t << ” R e t u r n i n g a v e c t o r o f l e n g t h ” <<
x l e n g t h << e n d l ;
return x ;
}
Review: Writing C++ functions
. C a l l ( ’ numeric i d ’ , x )
• Wrap with R:
num . i d ( rnorm ( 1 0 ) )
Review: Defining Variables
NumericVector x ;
int x length ;
NumericVector x ( x r ) ;
NumericVector x square ( x l e n g t h ) ;
while ( condition ) {
body
}
• While-loop:
• Simpler syntax
• More flexible—you can do whatever you want
• . . . but you have to do everything yourself
• For-loop:
• More complicated syntax
• Simpler to use in specific uses
• . . . perfect for counting
Converting While-loop to For-loop
f o r ( initialize ; initialize
condition ; while ( condition ) {
increment ) { body
body increment
} }
Example: For-loop to While-loop
int i = 0;
while ( i < x l e n g t h ) {
x square [ i ] = x [ i ] ∗ x [ i ] ;
i = i + 1;
}
• Very similar to R
i f ( condition ) {
true body
}
else {
false body
}
i f ( condition ) {
true body
}
f o r ( i n t i = 0 ; i < x l e n g t h ; ++ i ) {
i f ( x [ i ] >= 0 ) {
x abs [ i ] = x [ i ] ;
}
else {
x abs [ i ] = −x [ i ] ;
}
}
r e t u r n x abs ;
}
Warning!!
• In R, if-else is a expression
• It returns a value
• . . . the value of the last executed expression in its body
• In C++, if-else is a statement
• This means that it does not return a value
• In R, we can write: y <−if (x < 0) −1 else 1
• You can’t do that in C++, because there is nothing to
assign
• You have to do:
i f ( x < 0) {
y = −1;
}
else {
y = 1;
}
Summary: Control Structures
• Similar to those in R
• if-else
• for-loop
• while-loop
• With some important differences in syntax and behaviour
• Keep those in mind!!
Rcpp Performance
PROTECT( a = c oe r c eV e ct o r ( a , REALSXP ) ) ;
PROTECT( b = c oe r c eV e ct o r ( b , REALSXP ) ) ;
na = l e n g t h ( a ) ;
nb = l e n g t h ( b ) ;
nab = na + nb − 1 ;
PROTECT( ab = a l l o c V e c t o r (REALSXP, nab ) ) ;
xa = REAL( a ) ;
xb = REAL( b ) ;
xab = REAL( ab ) ;
f o r ( i = 0 ; i < nab ; i ++)
xab [ i ] = 0 . 0 ;
f o r ( i = 0 ; i < na ; i ++)
f o r ( j = 0 ; j < nb ; j ++)
xab [ i + j ] += xa [ i ] ∗ xb [ j ] ;
UNPROTECT( 3 ) ;
r e t u r n ( ab ) ;
}
Convolution in Rcpp
# include <Rcpp . h>
RcppExport SEXP convolve (SEXP a , SEXP b )
{
Rcpp : : NumericVector xa ( a ) ;
Rcpp : : NumericVector xb ( b ) ;
i n t n xa = xa . s i z e ( ) ;
i n t n xb = xb . s i z e ( ) ;
i n t nab = n xa + n xb − 1 ;
f o r ( i n t i = 0 ; i < n xa ; i ++)
f o r ( i n t j = 0 ; j < n xb ; j ++)
xab [ i + j ] += xa [ i ] ∗ xb [ j ] ;
r e t u r n xab ;
}
Performance Results
• To compile:
R CMD check package name
• Then install it:
R CMD INSTALL package name
• Now you can use it like any other R package:
library(package name)
Summary: Packages
• Packaging simplifies:
• compiling,
• loading, and
• distributing your code
• Use it!
Easy Matrix Math with RcppArmadillo
Rcpp : : NumericVector x r ( x ) ;
arma : : vec x ( x r . begin ( ) , x r . s i z e ( ) , f a l s e ) ;
/ / F i l e : a t a . cpp
# include <RcppArmadillo . h>
r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;
}
Armadillo Matrix Type
NumericMatrix A r ( A ) ;
mat A( A r . begin ( ) , A r . nrow ( ) ,
A r . ncol ( ) , false ) ;
r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;
+, - Element-wise addition/subtraction
%, / Element-wise multiplication/division
* Matrix multiplication
==, != Element-wise (in)equality comparison
<, <= Element-wise less-than (or equal) comparison
>, >= Element-wise greater-than (or equal) comparison
Armadillo Matrix Functions
r e t u r n wrap ( t r a n s ( A ) ∗ A ) ;
mat X ( Xr . begin ( ) , n , k , f a l s e ) ;
vec y ( y r . begin ( ) , y r . s i z e ( ) , f a l s e ) ;
vec c o e f = s o l v e ( X , y ) ; // f i t model y ˜ X
vec r e s i d u a l s = y − X∗ c o e f ;
double s i g 2 = a s s c a l a r ( t r a n s ( r e s i d u a l s ) ∗ r e s i d u a l s / ( n−k ) ) ;
/ / std . e r r o r of estimate
vec s t d e r r e s t = s q r t ( s i g 2 ∗ diagvec ( i n v ( t r a n s (X) ∗X ) ) ) ;
return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
) ;
}
Armadillo Matrix Decomposition
vec c o e f = s o l v e ( X , y ) ;
> dyn.load(’fast_lm.so’)
> data(trees)
> .Call(’fast_lm’, log(trees$Volume),
cbind(1, log(trees$Girth)) )
$coefficients
[,1]
[1,] -2.353325
[2,] 2.199970
$stderr
[,1]
[1,] 0.23066284
[2,] 0.08983455
Creating R Lists
return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
);
• Creates an R list
• Puts Armadillo vectors and matrices into named elements
• Returns the list to R
Calling R Functions
F u n c t i o n r n o r m f n ( ” rnorm ” ) ;
NumericVector o u t r = r n o r m f n ( 1 0 ,
[ ” mean ” ] = 3 ,
[ ” sd ” ] = 2 ) ;
vec o u t = vec ( o u t r . begin ( ) , o u t r . s i z e ( ) ) ;
• Creates an R function
• Calls it with a combination of position and named
arguments
• Handy, but also has its disadvantages:
• This will slow down your code!!
• The returned value is an Rcpp type (e.g., NumericVector)
• So you’ll you’ll need to get back to Armadillo types yourself
• Use it only if you really have to
Organizing Your Code
vec c o e f = f i t ( X , y ) ;
vec s t d e r r = s t d e r r e s t ( y , X , c o e f ) ;
return L i s t : : create (
Named( ” c o e f f i c i e n t s ” ) = coef ,
Named( ” s t d e r r ” ) = stderrest
);
}
[...]
Example: Fast Linear Regression (cont’d)
[...]
/ / f i t model y ˜ X
vec f i t ( const mat& X , const vec& y ) {
return solve (X, y ) ;
}
/ / std . e r r o r of estimate
vec s t d e r r e s t ( const vec& y , const mat& X ,
const vec& c o e f ) {
vec r e s i d u a l s = y − X∗ c o e f ;
i n t n = X . n rows , k = X . n c o l s ;
double s i g 2 = a s s c a l a r ( t r a n s ( r e s i d u a l s ) ∗
r e s i d u a l s / ( n−k ) ) ;
r e t u r n s q r t ( s i g 2 ∗ diagvec ( i n v ( t r a n s ( X) ∗X ) ) ) ;
}
Passing Parameters
/ / std . e r r o r of estimate
vec s t d e r r e s t ( const vec& y , const mat& X ,
const vec& c o e f ) {
vec r e s i d u a l s = y − X∗ c o e f ;
[...]
}
Multi-file Development
#endif
Header Files Rules
# include ” f a s t l m . h ”