How can I convert an object dtype to integer by removing letter characters?
Question:
My question is related to a conversion of a dtype object to integer. Specifically, I want to remove the characters of the dtype and convert it to integer so I can handle it as an integer.
I have the following dataframe:
data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
'av':[0, 1, 0, 1]}
df = pd.DataFrame(data)
The df.dtypes
are:
id object
av int64
dtype: object
I would like to convert the id
object column to an integer one. I understand that I cannot do this since the column contains letters as well. I was thinking to delete the letter characters from this column so I can make the conversion after by using df['id'].astype(int)
.
Do you have any idea on how I can delete the letter characters from the id
column?
Thank you in advance.
Answers:
here is one way to to it using regex
# d : digit character
#[^d] : match for non-digit character
df['id'].str.replace(r'[^d]','', regex=True)
0 9011001
1 901100183
2 901100178
3 901100156
Name: id, dtype: object
You can do it like this :
import re
import pandas as pd
data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
'av':[0, 1, 0, 1]}
df = pd.DataFrame(data)
df["id"] = pd.to_numeric(df["id"].apply(lambda x: re.sub("[^d]", "", x)))
Here is the result :
id av
0 9011001 0
1 901100183 1
2 901100178 0
3 901100156 1
My question is related to a conversion of a dtype object to integer. Specifically, I want to remove the characters of the dtype and convert it to integer so I can handle it as an integer.
I have the following dataframe:
data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
'av':[0, 1, 0, 1]}
df = pd.DataFrame(data)
The df.dtypes
are:
id object
av int64
dtype: object
I would like to convert the id
object column to an integer one. I understand that I cannot do this since the column contains letters as well. I was thinking to delete the letter characters from this column so I can make the conversion after by using df['id'].astype(int)
.
Do you have any idea on how I can delete the letter characters from the id
column?
Thank you in advance.
here is one way to to it using regex
# d : digit character
#[^d] : match for non-digit character
df['id'].str.replace(r'[^d]','', regex=True)
0 9011001
1 901100183
2 901100178
3 901100156
Name: id, dtype: object
You can do it like this :
import re
import pandas as pd
data = {'id':['9011001', '9011001-83V', '9011001-78G', '9011001-56V'],
'av':[0, 1, 0, 1]}
df = pd.DataFrame(data)
df["id"] = pd.to_numeric(df["id"].apply(lambda x: re.sub("[^d]", "", x)))
Here is the result :
id av
0 9011001 0
1 901100183 1
2 901100178 0
3 901100156 1