Nima Hejazi

3 minute read

R is a great, highly flexible language for statistical computing, but it does suffer greatly from performance issues. As I’ve steadily increase my use of R, I quickly became aware that I would have to one day learn to integrate R with a programming language with better performance, the main choice here being C++. To integrate R with C++, the Rcpp framework (and R package) was created, allowing for parts of the R code of a given package or project to be re-written in C++ and easily integrated with R. Using Rcpp comes with great advantages in terms of R code performance; however, it obviously requires that one learn C++. I was about to devote a great deal of time to doing so, when – fortuitously – I came across the rather new renjin project. Renjin is a new (in-development) interpreter for GNU R that relies on the Java Virtual Machine (JVM) to enhance R’s performance. The idea seems to be that it can eventually serve as a drop-in replacement for GNU R. It seems that the renjin R package can be used to provide performance gains via interfacing with the JVM, just by wrapping standard R code.

Minimal example

For now, I just thought I would try the example from the renjin R package documentation, more involved examples might be added to this post later or come in separate blog posts of their own. Here we go:

Let’s make sure we have the newest version of Renjin:

if (!require(renjin)) {
install.packages("https://nexus.bedatadriven.com/content/groups/public/org/renjin/renjin-gnur-package/0.8.2404/renjin-gnur-package-0.8.2404.tar.gz")
}
## Loading required package: renjin
library(renjin)

Let’s define a function to simply add by iteration:

bigsum <- function(n) {
  sum <- 0
  for(i in seq(from = 1, to = n)) {
    sum <- sum + i
  }
  sum
}

We can improve the speed of this function by pre-compiling it to bytecode using R’s native bytecode compiler. We’d expect this to save us some time relative to the naive implementation.

bigsumc <- compiler::cmpfun(bigsum) # GNU R's byte code compiler

Alright, now we’re ready to compare the performances of the naive and bytecode-compiled implementations:

time_norm <- system.time(bigsum(1e7))
time_comp <- system.time(bigsumc(1e7))

Notice that directly using R’s native bytecode compiler improves the performance of our bigsum function quite a bit – that is, considering the time the system spends on the computation, we save about 0.01 seconds, (roughly) a factor of 2. Maybe renjin can help us out even more?

time_renjin <- system.time(renjin(bigsum(1e7)))
table <- as.data.frame(rbind(as.numeric(time_norm), as.numeric(time_comp),
                             as.numeric(time_renjin)))[, c(1, 2, 3)]
colnames(table) <- c("user", "system", "total")
rownames(table) <- c("naive", "cmpfun", "renjin")
print(table)
##         user system total
## naive  0.459  0.023 0.504
## cmpfun 0.485  0.011 0.541
## renjin 0.607  0.036 0.469

Wow – just, wow. The gain in computational efficiency here is incredible! Using renjin – even just as a wrapper – improves the time cost (on the system side) by a factor of 1 relative to the naive implementation and by quite a bit still (a factor of 0) when compared to the bytecode-compiled version of our function. Damn – I’m at a loss for words. This was just a simple example, but we were able to save so much computational time just by naively calling renjin…and it took just a few extra characters to call it as a wrapper…

Although Renjin is still in its infancy, I can’t help but be excited for the future of R – and statistical computing in general – with how well its already performing. We’re going to be able to (try to) do great things with these new tools 🌟

comments powered by Disqus