Convert number strings with commas in pandas DataFrame to float
Question:
I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.
a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)
I am guessing I need to use locale.atof. Indeed
df[0].apply(locale.atof)
works as expected. I get a Series of floats.
But when I apply it to the DataFrame, I get an error.
df.apply(locale.atof)
TypeError: (“cannot convert the series to “, u’occurred at index 0′)
and
df[0:1].apply(locale.atof)
gives another error:
ValueError: (‘invalid literal for float(): 1,200′, u’occurred at index 0’)
So, how do I convert this DataFrame
of strings to a DataFrame of floats?
Answers:
If you’re reading in from csv then you can use the thousands arg:
df.read_csv('foo.tsv', sep='t', thousands=',')
This method is likely to be more efficient than performing the operation as a separate step.
You need to set the locale first:
In [ 9]: import locale
In [10]: from locale import atof
In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'
In [12]: df.applymap(atof)
Out[12]:
0 1
0 1200 4200.00
1 7000 -0.03
2 5 0.00
You may use the pandas.Series.str.replace method:
df.iloc[:,:].str.replace(',', '').astype(float)
This method can remove or replace the comma in the string.
You can convert one column at a time like this :
df['colname'] = df['colname'].str.replace(',', '').astype(float)
This will work for strings such as ‘-55,00’ or ‘5.500,00’ and convert them to floats -55.00 and 5500.00, respectively.
df['colname'] = df['colname'].str.replace('.','', regex=True).str.replace(',', '.', regex=True).astype(float)
I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.
a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)
I am guessing I need to use locale.atof. Indeed
df[0].apply(locale.atof)
works as expected. I get a Series of floats.
But when I apply it to the DataFrame, I get an error.
df.apply(locale.atof)
TypeError: (“cannot convert the series to “, u’occurred at index 0′)
and
df[0:1].apply(locale.atof)
gives another error:
ValueError: (‘invalid literal for float(): 1,200′, u’occurred at index 0’)
So, how do I convert this DataFrame
of strings to a DataFrame of floats?
If you’re reading in from csv then you can use the thousands arg:
df.read_csv('foo.tsv', sep='t', thousands=',')
This method is likely to be more efficient than performing the operation as a separate step.
You need to set the locale first:
In [ 9]: import locale
In [10]: from locale import atof
In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'
In [12]: df.applymap(atof)
Out[12]:
0 1
0 1200 4200.00
1 7000 -0.03
2 5 0.00
You may use the pandas.Series.str.replace method:
df.iloc[:,:].str.replace(',', '').astype(float)
This method can remove or replace the comma in the string.
You can convert one column at a time like this :
df['colname'] = df['colname'].str.replace(',', '').astype(float)
This will work for strings such as ‘-55,00’ or ‘5.500,00’ and convert them to floats -55.00 and 5500.00, respectively.
df['colname'] = df['colname'].str.replace('.','', regex=True).str.replace(',', '.', regex=True).astype(float)