RcppArrayFire 0.0.2: Rcpp integration for ArrayFire

The RcppArrayFire package uses Rcpp to provide an interface from R to and from the ArrayFire library, an open source library that can make use of GPUs and other hardware accelerators via CUDA or OpenCL.

The official R bindings expose ArrayFire data structures as objects in R, which would require a large amount of code to support all the methods defined in ArrayFire’s C/C++ API. RcppArrayFire instead, which is derived from RcppFire by Kazuki Fukui, follows the lead of packages like RcppArmadillo or RcppEigen to provide seamless communication between R and ArrayFire at the C++ level.

Installation

Please note that currently RcppArrayFire has only been tested on Linux systems.

Prerequisites

In order to use RcppArrayFire you will need development tools for R, Rcpp and the ArrayFire library and header files. On a sufficiently recent Debian based or derived system, this can be achieved with:

sudo apt-get install r-base-dev r-cran-rcpp libarrayfire-dev libarrayfire-unified3 

This will install the unified and CPU backends. The CUDA backend has not been packaged for Debian, and usage of the packaged OpenCL backend is hindered by a bug in the clBLAS package. For serious usage it is currently better to build from source or use the binary installer from ArrayFire:

sudo apt-get install r-base-dev r-cran-rcpp libglfw3
# download installer from http://arrayfire.com/download/
sudo sh arrayfire-installer.sh --exclude-subdir --prefix=/usr/local

In the last command you have to adjust the name of the installer script and (optionally) the installation prefix. For GPU support, you have to install additional drivers. For many build-in Intel GPUs, you can use

sudo apt-get install beignet-opencl-icd ocl-icd-libopencl1

Installing CUDA or other OpenCL drivers is beyond the scope of this post, but see the ArrayFire documentation for details.

Package installation

RcppArrayFire is not on CRAN, but you can install the current version via drat:

#install.packages("drat")
drat::addRepo("RInstitute")
install.packages("RcppArrayFire")

If you have installed ArrayFire in a non-standard directory, you have to use the configure argument --with-arrayfire:

install.packages("RcppArrayFire", configure.args = "--with-arrayfire=/opt/arrayfire-3")

Usage

Calculating pi by simulation

Let’s look at the classical example of calculating pi via simulation. The basic idea is to generate a large number of random points within the unit square. An approximation for pi can then be calculated from the ratio of points within the unit circle to the total number of points. A vectorized implementation in R might look like this:

piR <- function(N) {
    x <- runif(N)
    y <- runif(N)
    4 * sum(sqrt(x^2 + y^2) < 1.0) / N
}

set.seed(42)
system.time(cat("pi ~= ", piR(10^7), "\n"))
## pi ~=  3.140899
##    user  system elapsed 
##   0.688   0.076   0.766

A simple way to use C++ code in R is to use the inline package or cppFunction() from Rcpp, which are both possible with RcppArrayFire. An implementation in C++ using ArrayFire might look like this:

src <- '
double piAF (const int N) {
    array x = randu(N, f32);
    array y = randu(N, f32);
    return 4.0 * sum<float>(sqrt(x*x + y*y) < 1.0) / N;
}'
Rcpp::cppFunction(code = src, depends = "RcppArrayFire", includes = "using namespace af;")

RcppArrayFire::arrayfire_set_seed(42)
system.time(cat("pi ~= ", piAF(10^7), "\n"))
## pi ~=  3.141066
##    user  system elapsed 
##   0.036   0.024  0.019

Several things are worth noting:

(1) The syntax is almost identical. Besides the need for using types and a different function name when generating random numbers, the argument f32 to randu as well as the float type catches the eye. These instruct ArrayFire to use single precision floats, since not all devices support double precision floating point numbers. If you want to use double precision, you have to specify f64 and double.

(2) The results are not the same, since ArrayFire uses a different random number generator.

(3) The speed-up is quite impressive. However, sometimes the first invocation of a function is not as fast as expected due to the just-in-time compilation used by ArrayFire.

Arrays as parameters

Up to now we have only considered simple types like double or int as function parameters and return values. However, we can also use arrays. Consider the matrix product X’ X for a random matrix X in R:

set.seed(42)
N <- 40
X <- matrix(rnorm(N * N * 2), ncol = N)
tXXR <- t(X) %*% X

The matrix multiplication can be implemented with RcppArrayFire using the appropriate matmul function:

src <- '
af::array squareMatrix(const RcppArrayFire::typed_array<f32>& x) {
    return af::matmulTN(x ,x);
}'
Rcpp::cppFunction(code = src, depends = "RcppArrayFire")
tXXGPU <- squareMatrix(X)

all.equal(tXXR, tXXGPU)
## [1] "Mean relative difference: 1.372856e-07"

Since an object of type af::array can contain different data types, the templated wrapper class RcppArrayFire::typed_array<> is used to indicate the desired data type when converting from R to C++. Again single precision floats are used with ArrayFire, which explains the difference between the two results. We can be sure that double precision is supported by switching the computation backend to “CPU”, which produces identical results:

src <- '
af::array squareMatrixF64(const RcppArrayFire::typed_array<f64>& x) {
    return af::matmulTN(x ,x);
}'
Rcpp::cppFunction(code = src, depends = "RcppArrayFire")

RcppArrayFire::arrayfire_set_backend("CPU")
tXXCPU <- squareMatrixF64(X)
RcppArrayFire::arrayfire_set_backend("DEFAULT")

all.equal(tXXR, tXXCPU)
## [1] TRUE

Usage in a package

More serious functions should be defined in a permanent fashion. To facilitate this, RcppArrayFire contains the function RcppArraFire.package.skeleton(). This functions initialises a package with suitable configure script for linking with ArrayFire and RcppArrayFire. In order to implement new functionality you can then write C++ functions and save them in the src folder. Functions that should be callable from R should be marked with the [[Rcpp::export]] attribute. See the Rcpp vignettes on attributes and package writing for further details.

Outlook

Besides testing the package on other platforms than Linux, future versions might include

The R Sell Side: Interpreter

by Karl-Kuno Kunze

The novice in the R ecosystem may face quite a challenge to find her way through the jungle of R providers as well as licensing and pricing models. In addition, it helps to understand the different components that make a productive R environment.

In the following I give my view of the present situation. However, the R ecosystem is rather dynamic. So, should I have left something out, don’t hesitate to write a comment.

What is R?

Continue reading

Preparing Data with Package stringr

by Karl-Kuno Kunze

In the beginning of most data science projetcs, data must be prepared for the task at hand. Very often only parts of entries in columns are necessary or entries must be re-formatted, especially for date and time entries. For this task you need good tools to manipulate text.

Even in the base version R comes packed with many functions for that purpose. However, these are partly inconsistent as syntax is concerned or run somewhat behind the abilities of languages like Python – or require quite some complex code. Package stingr by Hadley Wickham comes in quite handy to fill the gap.

Continue reading

First Aid for Closures

by Karl-Kuno Kunze

My Article about Closures is by far the most frequently viewed article in the blog. The reason for this is probably the somewhat cryptic error message  object of type 'closure' is not subsettable, which R sends to you sometimes. Here is a three step first-aid plan.

Immediate help

If you want to continue your work with the least detour possible:

Replace the word closure with the word function in the error message.

Good luck with your debugging efforts and see you soon!

Continue reading