Error converting format from object to float in pandas
Question:
I am trying to convert a data type object to a float in Pandas, but I cannot fix this error.
To simplify the explanation and ensure clear understanding, I will show the whole picture using a toy example:
import pandas as pd
data = {
'id': ['EC','4E','E1','F8','4E'],
'item_date': [20210401, 20210401, 20210401, 20210401, 20210401],
'quantity tons': [54.15113862, 768.0248392, 386.1279489, 202.4110654, 'e'],
'customer': [30156308, 30202938, 30153963, 30349574, 30211560],
'country': [28, 25, 30, 32, 28],
'status': ['Won', 'Won', 'Won', 'Won', 'Won']
}
df = pd.DataFrame(data)
I tried to convert the column to float
like this:
df["quantity tons"] = df["quantity tons"].astype(float)
But it shows an error:
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
d:Overdose AIpart_a.ipynb Cell 4 in 1
----> 1 df["quantity tons"] = df["quantity tons"].astype(float)
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoregeneric.py:6245, in NDFrame.astype(self, dtype, copy, errors)
6238 results = [
6239 self.iloc[:, i].astype(dtype, copy=copy)
6240 for i in range(len(self.columns))
6241 ]
6243 else:
6244 # else, only a single dtype is given
-> 6245 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6246 return self._constructor(new_data).__finalize__(self, method="astype")
6248 # GH 33113: handle empty frame or series
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:446, in BaseBlockManager.astype(self, dtype, copy, errors)
445 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 446 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:348, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
346 applied = b.apply(f, **kwargs)
347 else:
--> 348 applied = getattr(b, f)(**kwargs)
349 except (TypeError, NotImplementedError):
...
169 # Explicit copy, or required since NumPy can't view from / to object.
--> 170 return arr.astype(dtype, copy=True)
172 return arr.astype(dtype, copy=copy)
ValueError: could not convert string to float: 'e'
Answers:
The problem is that there is a string "e" in ‘quantity tons’ column (line 173088) in the file you provided.
To avoid this issue, I would suggest to check whether a column has any strings before changing its dtype
. You can use the following code:
df[df['quantity tons'].apply(lambda x: isinstance(x, str))]
The output will show you only the rows where ‘quantity tons’ column contains strings.
I am trying to convert a data type object to a float in Pandas, but I cannot fix this error.
To simplify the explanation and ensure clear understanding, I will show the whole picture using a toy example:
import pandas as pd
data = {
'id': ['EC','4E','E1','F8','4E'],
'item_date': [20210401, 20210401, 20210401, 20210401, 20210401],
'quantity tons': [54.15113862, 768.0248392, 386.1279489, 202.4110654, 'e'],
'customer': [30156308, 30202938, 30153963, 30349574, 30211560],
'country': [28, 25, 30, 32, 28],
'status': ['Won', 'Won', 'Won', 'Won', 'Won']
}
df = pd.DataFrame(data)
I tried to convert the column to float
like this:
df["quantity tons"] = df["quantity tons"].astype(float)
But it shows an error:
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
d:Overdose AIpart_a.ipynb Cell 4 in 1
----> 1 df["quantity tons"] = df["quantity tons"].astype(float)
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoregeneric.py:6245, in NDFrame.astype(self, dtype, copy, errors)
6238 results = [
6239 self.iloc[:, i].astype(dtype, copy=copy)
6240 for i in range(len(self.columns))
6241 ]
6243 else:
6244 # else, only a single dtype is given
-> 6245 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6246 return self._constructor(new_data).__finalize__(self, method="astype")
6248 # GH 33113: handle empty frame or series
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:446, in BaseBlockManager.astype(self, dtype, copy, errors)
445 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 446 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:348, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
346 applied = b.apply(f, **kwargs)
347 else:
--> 348 applied = getattr(b, f)(**kwargs)
349 except (TypeError, NotImplementedError):
...
169 # Explicit copy, or required since NumPy can't view from / to object.
--> 170 return arr.astype(dtype, copy=True)
172 return arr.astype(dtype, copy=copy)
ValueError: could not convert string to float: 'e'
The problem is that there is a string "e" in ‘quantity tons’ column (line 173088) in the file you provided.
To avoid this issue, I would suggest to check whether a column has any strings before changing its dtype
. You can use the following code:
df[df['quantity tons'].apply(lambda x: isinstance(x, str))]
The output will show you only the rows where ‘quantity tons’ column contains strings.