How do Rpy2, pyrserve and PypeR compare?

Question:

I would like to access R from within a Python program. I am aware of Rpy2, pyrserve and PypeR.

What are the advantages or disadvantages of these three options?

Asked By: DanB

Answers:

From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very
convenient for frequent interaction operations between Python and R. This package allows
Python programs to pass Python objects of basic data types to R functions and return the
results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive
duties. A lot of time and memory are inevitably consumed in producing the Python
copy of the R data because in every round of a conversation RPy converts the returned
value of an R expression into a Python object of basic types or NumPy array. RPy2, a
recently developed branch of RPy, uses Python objects to refer to R objects instead of
copying them back into Python objects. This strategy avoids frequent data conversions
and improves speed. However, memory consumption remains a problem. […]
When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R’s command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large-
size R objects is rarely released after these objects are deleted. Sometimes the only
way to release memory from R is to quit R. RPy module wraps R in a Python object.
However, the R library will stay in memory even if the Python object is deleted. In other
words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled
with a specific R version on POSIX (Portable Operating System Interface for Unix)
systems, and the R must be compiled with the shared library enabled. Also, the binary
distributions for Windows are bound to specic combinations of different versions of
Python/R, so it is quite frequent that a user has difficulty in finding a distribution that
ts the user’s software environment.

Answered By: Henrik

I know one of the 3 better than the others, but in the order given in the question:

rpy2:

C-level interface between Python and R (R running as an embedded process)
R objects exposed to Python without the need to copy the data over
Conversely, Python’s numpy arrays can be exposed to R without making a copy
Low-level interface (close to the R C-API) and high-level interface (for convenience)
In-place modification for vectors and arrays possible
R callback functions can be implemented in Python
Possible to have anonymous R objects with a Python label
Python pickling possible
Full customization of R’s behavior with its console (so possible to implement a full R GUI)
MSWindows with limited support

pyrserve:

native Python code (will/should/may work with CPython, Jython, IronPython)
use R’s Rserve
advantages and inconveniences linked to remote computation and to RServe

pyper:

native Python code (will/should/may work with CPython, Jython, IronPython)
use of pipes to have Python communicate with R (with the advantages and inconveniences linked to it)

edit: Windows support for rpy2

Answered By: lgautier

in pyper, i can’t pass large matrix from python to r instance with assign(). however, i don’t have issue with rpy2.
it is just my experience.

Answered By: statcompute

From a developer’s prospective, we used to use rpy/rpy2 to provide statistical and drawing functions to our Python-based application. It has caused huge problems in delivering our application because rpy/rpy2 needs to be compiled for specific combinations of Python and R, which makes it infeasible for us to provide binary distributions that work out of box unless we bundle R as well. Because rpy/rpy2 are not particularly easy to install, we ended up replacing relevant parts with native Python modules such as matplotlib. We would have switched to pyrserve if we had to use R because we could start a R server locally and connect to it without worrying about the version of R.

Answered By: user2283347