How to convert negative strings in float numbers in pandas?
Question:
I have a series of negative strings in my dataset. I’d like to convert them into negative floats, but get the ValueError: could not convert string to float: '-'
. I suppose there is a problem with the enconding format, so I tried to replace -
with the Unicode -
hyphen, but got the same error anyway.
I’ve tried to replace every possible Unicode code with a normal hyphen, but it didn’t work.
I use Python 3.8.1 and pandas 1.0.2.
Are there any workarounds?
P.S. There is a similar question here, but it didn’t help.
Here what I’ve done:
The dataset is here. It’s called ‘1240K+HO’, extension .anno.
Then:
# open file
df = pd.read_table('v42.4.1240K_HO.anno', index_col=0, usecols=['Index',
'Instance ID',
'Master ID',
'Average of 95.4% date range in calBP (defined as 1950 CE)',
'Country',
'Lat.',
'Long.'],
na_values='..')
Then I try to convert strings in ‘Lat.’ column to float numbers.
# convert strings to floats
df['Lat.'] = df['Lat.'].astype(float)
Answers:
The issue is that there is at least one '-'
value. That’s it, just a hyphen with no figure after it.
You can do this:
import numpy as np
df['Lat.'] = df['Lat.'].replace('-',np.nan)
Then this will work:
df['Lat.'] = df['Lat.'].astype(float)
in case you still get an error you can use pd.to_numeric
with coerce to convert non-numeric elements to NaN. you can then get convert all NaN to 0 or whatever you wish from there
import pandas as pd
df['Lat.'] = pd.to_numeric(df['Lat.'],errors='coerce')
I have a series of negative strings in my dataset. I’d like to convert them into negative floats, but get the ValueError: could not convert string to float: '-'
. I suppose there is a problem with the enconding format, so I tried to replace -
with the Unicode -
hyphen, but got the same error anyway.
I’ve tried to replace every possible Unicode code with a normal hyphen, but it didn’t work.
I use Python 3.8.1 and pandas 1.0.2.
Are there any workarounds?
P.S. There is a similar question here, but it didn’t help.
Here what I’ve done:
The dataset is here. It’s called ‘1240K+HO’, extension .anno.
Then:
# open file
df = pd.read_table('v42.4.1240K_HO.anno', index_col=0, usecols=['Index',
'Instance ID',
'Master ID',
'Average of 95.4% date range in calBP (defined as 1950 CE)',
'Country',
'Lat.',
'Long.'],
na_values='..')
Then I try to convert strings in ‘Lat.’ column to float numbers.
# convert strings to floats
df['Lat.'] = df['Lat.'].astype(float)
The issue is that there is at least one '-'
value. That’s it, just a hyphen with no figure after it.
You can do this:
import numpy as np
df['Lat.'] = df['Lat.'].replace('-',np.nan)
Then this will work:
df['Lat.'] = df['Lat.'].astype(float)
in case you still get an error you can use pd.to_numeric
with coerce to convert non-numeric elements to NaN. you can then get convert all NaN to 0 or whatever you wish from there
import pandas as pd
df['Lat.'] = pd.to_numeric(df['Lat.'],errors='coerce')