How can I convert an object dtype to integer by removing letter characters?

Question:

My question is related to a conversion of a dtype object to integer. Specifically, I want to remove the characters of the dtype and convert it to integer so I can handle it as an integer.

I have the following dataframe:

data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
        'av':[0, 1, 0, 1]}
 
df = pd.DataFrame(data)

The df.dtypes are:

id     object
av     int64
dtype: object

I would like to convert the id object column to an integer one. I understand that I cannot do this since the column contains letters as well. I was thinking to delete the letter characters from this column so I can make the conversion after by using df['id'].astype(int).

Do you have any idea on how I can delete the letter characters from the id column?

Thank you in advance.

Asked By: Anas.S

||

Answers:

here is one way to to it using regex

# d : digit character
#[^d] : match for non-digit character

df['id'].str.replace(r'[^d]','', regex=True)
0      9011001
1    901100183
2    901100178
3    901100156
Name: id, dtype: object
Answered By: Naveed

You can do it like this :

import re
import pandas as pd

data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
        'av':[0, 1, 0, 1]}
 
df = pd.DataFrame(data)
df["id"] = pd.to_numeric(df["id"].apply(lambda x: re.sub("[^d]", "", x)))

Here is the result :

    id  av
0   9011001 0
1   901100183   1
2   901100178   0
3   901100156   1
Answered By: JP Marcel
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.