Exception Handling in Pandas .apply() function

Question:

If I have a DataFrame:

myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

Gives the following dataframe (Starting out on stackoverflow and don’t have enough reputation for an image of the DataFrame)

   | A  | B  |

0  | 11 | 11 |

1  | 22 | 2A |

2  | 33 | 33 |

If i want to convert column B to int values and drop values that can’t be converted I have to do:

def convertToInt(cell):
    try:
        return int(cell)
    except:
        return None
myDF['B'] = myDF['B'].apply(convertToInt)

If I only do:

myDF[‘B’].apply(int)

the error obviously is:

C:WinPython-32bit-2.7.5.3python-2.7.5libsite-packagespandaslib.pyd
in pandas.lib.map_infer (pandaslib.c:42840)()

ValueError: invalid literal for int() with base 10: ‘2A’

Is there a way to add exception handling to myDF[‘B’].apply()

Thank you in advance!

Asked By: RukTech

||

Answers:

A way to achieve that with lambda:

myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)

For your input:

>>> myDF
    A   B
0  11  11
1  22  2A
2  33  33

[3 rows x 2 columns]

>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0    11
1   NaN
2    33
Name: B, dtype: float64
Answered By: Amit

much better/faster to do:

In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]: 
    A   B
0  11  11
1  22 NaN
2  33  33

[3 rows x 2 columns]

In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]: 
A      int64
B    float64
dtype: object

This is a vectorized method of doing just this. The coerce flag say to mark as nan anything that cannot be converted to numeric.

You can of course do this to a single column if you’d like.

Answered By: Jeff

I had the same question, but for a more general case where it was hard to tell if the function would generate an exception (i.e. you couldn’t explicitly check this condition with something as straightforward as isdigit).

After thinking about it for a while, I came up with the solution of embedding the try/except syntax in a separate function. I’m posting a toy example in case it helps anyone.

import pandas as pd
import numpy as np

x=pd.DataFrame(np.array([['a','a'], [1,2]]))

def augment(x):
    try:
        return int(x)+1
    except:
        return 'error:' + str(x)

x[0].apply(lambda x: augment(x))
Answered By: atkat12