Error converting format from object to float in pandas

Question:

I am trying to convert a data type object to a float in Pandas, but I cannot fix this error.

To simplify the explanation and ensure clear understanding, I will show the whole picture using a toy example:

import pandas as pd

data = {
'id': ['EC','4E','E1','F8','4E'],
'item_date': [20210401, 20210401, 20210401, 20210401, 20210401],
'quantity tons': [54.15113862, 768.0248392, 386.1279489, 202.4110654, 'e'],
'customer': [30156308, 30202938, 30153963, 30349574, 30211560],
'country': [28, 25, 30, 32, 28],
'status': ['Won', 'Won', 'Won', 'Won', 'Won']
}

df = pd.DataFrame(data)

I tried to convert the column to float like this:

df["quantity tons"] = df["quantity tons"].astype(float)

But it shows an error:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
d:Overdose AIpart_a.ipynb Cell 4 in 1
----> 1 df["quantity tons"] = df["quantity tons"].astype(float)

File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoregeneric.py:6245, in NDFrame.astype(self, dtype, copy, errors)
   6238     results = [
   6239         self.iloc[:, i].astype(dtype, copy=copy)
   6240         for i in range(len(self.columns))
   6241     ]
   6243 else:
   6244     # else, only a single dtype is given
-> 6245     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6246     return self._constructor(new_data).__finalize__(self, method="astype")
   6248 # GH 33113: handle empty frame or series

File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:446, in BaseBlockManager.astype(self, dtype, copy, errors)
    445 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 446     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File c:UsersAdminAppDataLocalProgramsPythonPython310libsite-packagespandascoreinternalsmanagers.py:348, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
    346         applied = b.apply(f, **kwargs)
    347     else:
--> 348         applied = getattr(b, f)(**kwargs)
    349 except (TypeError, NotImplementedError):
...
    169     # Explicit copy, or required since NumPy can't view from / to object.
--> 170     return arr.astype(dtype, copy=True)
    172 return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: 'e'
Asked By: rashid

||

Answers:

The problem is that there is a string "e" in ‘quantity tons’ column (line 173088) in the file you provided.

To avoid this issue, I would suggest to check whether a column has any strings before changing its dtype. You can use the following code:

df[df['quantity tons'].apply(lambda x: isinstance(x, str))]

The output will show you only the rows where ‘quantity tons’ column contains strings.

Answered By: dramarama
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.