how to deal with strings on a numeric column in pandas?

Question:

I have a big dataset and I cannot convert the dtype from object to int because of the error "invalid literal for int() with base 10:" I did some research and it is because there are some strings within the column.

How can I find those strings and replace them with numeric values?

Asked By: Leonardo Urbiola

||

Answers:

Base 10 means it is a float. so In python you would do

int(float(____))

Since you used int(), I’m guessing you needed an integer value.

Answered By: BBB_TOAD

You might be looking for .str.isnumeric(), which will only allow you to filter the data for these numbers-in-strings and act on them independently .. but you’ll need to decide what those values should be

  • converted (maybe they’re money and you want to truncate , or another date format that’s not a UNIX epoch, or any number of possibilities..)
  • dropped (just throw them away)
  • something else
>>> df = pd.DataFrame({"a":["1", "2", "x"]})
>>> df
   a
0  1
1  2
2  x
>>> df[df["a"].str.isnumeric()]
   a
0  1
1  2
>>> df[~df["a"].str.isnumeric()]
   a
2  x
Answered By: ti7

Assuming ‘col’ the column name.

Just force convert to numeric, or NaN upon error:

df['col_num'] = pd.to_numeric(df['col'], errors='coerce')

If needed you can check which original values gave NaNs using:

df.loc[df['col'].notna()&df['col_num'].isna(), 'col']
Answered By: mozway

I am also having a similar situation but in my case one column has strings and Int together for example

Col A
32
45
thirty seven
fifty two
98

how should I convert this to an integer with the correct number, ie for thirty seven it should be 37

Answered By: Karishma Manohar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.