Convert Float to String in Pandas

Question:

I have a dataframe with the following dtypes.

> df.dtypes
    Col1         float64
    Col2          object
    dtype: object

When I do the following:

df['Col3']  = df['Col2'].apply(lambda s: len(s) >= 2  and s[0].isalpha())

I get:

TypeError: object of type 'float' has no len()

I believe if I convert "object" to "String", I will get to do what I want. However, when I do the following:

df['Col2'] = df['Col2'].astype(str)

the dtype of Col2 doesn’t change. I am a little confused with datatype "object" in Pandas. What exactly is "object"?

More info: This is how Col2 looks like:

               Col2
1                F5
2               K3V
3                B9
4               F0V
5             G8III
6              M0V:
7                G0
8      M6e-M8.5e Tc
Asked By: Rohit

||

Answers:

If a column contains string or is treated as string, it will have a dtype of object (but not necessarily true backward — more below). Here is a simple example:

import pandas as pd
df = pd.DataFrame({'SpT': ['string1', 'string2', 'string3'],
                   'num': ['0.1', '0.2', '0.3'],
                   'strange': ['0.1', '0.2', 0.3]})
print df.dtypes
#SpT        object
#num        object
#strange    object
#dtype: object

If a column contains only strings, we can apply len on it like what you did should work fine:

print df['num'].apply(lambda x: len(x))
#0    3
#1    3
#2    3

However, a dtype of object does not means it only contains strings. For example, the column strange contains objects with mixed types — and some str and a float. Applying the function len will raise an error similar to what you have seen:

print df['strange'].apply(lambda x: len(x))
# TypeError: object of type 'float' has no len()

Thus, the problem could be that you have not properly converted the column to string, and the column still contains mixed object types.

Continuing the above example, let us convert strange to strings and check if apply works:

df['strange'] = df['strange'].astype(str)
print df['strange'].apply(lambda x: len(x))
#0    3
#1    3
#2    3

(There is a suspicious discrepancy between df_cleaned and df_clean there in your question, is it a typo or a mistake in the code that causes the problem?)

Answered By: YS-L
"Hidden" nulls

If the column dtype is object, TypeError: object of type 'float' has no len() often occurs if the column contains NaN. Check if that’s the case by calling

df['Col2'].isna().any()

If it returns True, then there’s NaN and you probably need to handle that.

Vectorized str. methods

If null handling is not important, you can also call vectorized str.len(), str.isdigit() etc. methods. For example, the code in the OP can be written as:

df['Col3'] = df['Col2'].str.len().ge(2) & df['Col2'].str[0].str.isalpha()

to get the desired output without errors.

‘string’ dtype

Since pandas 1.0, there’s a new 'string' dtype where you can keep a Nullable integer dtype after casting a column into a 'string' dtype. For example, if you want to convert floats to strings without decimals, yet the column contains NaN values that you want to keep as null, you can use 'string' dtype.

df = pd.DataFrame({
    'Col1': [1.2, 3.4, 5.5, float('nan')]
})

df['Col1'] = df['Col1'].astype('string').str.split('.').str[0]

returns

0       1
1       3
2       5
3    <NA>
Name: Col1, dtype: object

where <NA> is a Nullable integer that you can drop with dropna() while df['Col1'].astype(str) casts NaNs into strings.

Answered By: cottontail