NA_character_ not identidied as NaN after importing it into Python with rpy2

Question:

I am using the following code inside a R magic cell:

%%R -o df

library(tibble)

df <- tibble(x = c("a", "b", NA))

However, when I run in another cell (a Python one):

df.isna()

I get

       x
1  False
2  False
3  False

In fact, the imported dataframe is

               x
1              a
2              b
3  NA_character_

How can I convert NA_character_ to a Python NaN?

I have tried

df.replace('NA_character_', np.nan)

but with no success.

Asked By: PaulS

||

Answers:

As you set out in the comments, the R NA_character_ value is not converted to np.nan, but has a different type, rpy2.rinterface_lib.sexp.NACharacterType. In this case, the solution is simply to iterate over the column and convert this type to np.nan:

import rpy2 # if you haven't already
df['x'] = df['x'].apply(lambda val: np.nan if isinstance(
    val, rpy2.rinterface_lib.sexp.NACharacterType) 
    else val
)

As for whether this is a bug, the changes for release 3.3.0 states:

The value nan in pandas Series with strings is now converted to R NA (issue #668).

However, the converse does not appear to happen. I don’t know whether that means it’s a bug, a design decision or simply that this has not yet been implemented.

Answered By: SamR
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.