pandas: convert column with multiple datatypes to int, ignore errors

Question:

I have a column with data that needs some massaging. the column may contain strings or floats. some strings are in exponential form. Id like to best try to format all data in this column as a whole number where possible, expanding any exponential notation to integer. So here is an example

df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].astype(int, errors  = 'ignore')

The above code does not seem to do a thing. i know i can convert the exponential notation and decimals with simply using the int function, and i would think the above astype would do the same, but it does not. for example, the following code work in python:

int(1170E1), int(1.17E+04), int(11700.0)
> (11700, 11700, 11700)

Any help in solving this would be appreciated. What i’m expecting the output to look like is:

0    '11700'
1    '11700'
2    '11700
3    '24477G'
4    '124601'
5    '247602'
Asked By: mike01010

||

Answers:

You may check with pd.to_numeric

df.code = pd.to_numeric(df.code,errors='coerce').fillna(df.code)
Out[800]: 
0     11700.0
1     11700.0
2     11700.0
3      24477G
4    124601.0
5    247602.0
Name: code, dtype: object

Update

df['code'] = df['code'].astype(object)
s = pd.to_numeric(df['code'],errors='coerce')
df.loc[s.notna(),'code'] = s.dropna().astype(int)
df
Out[829]: 
     code
0   11700
1   11700
2   11700
3  24477G
4  124601
5  247602
Answered By: BENY

BENY’s answer should work, although you potentially leave yourself open to catching exceptions and filling that you don’t want to. This will also do the integer conversion you are looking for.

def convert(x):
    try:
        return str(int(float(x)))
    except ValueError:
        return x

df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].apply(convert)

outputs

0     11700
1     11700
2     11700
3    24477G
4    124601
5    247602

where each element is a string.

I will be the first to say, I’m not proud of that triple cast.

Answered By: pseudoabdul
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.