Python Dataframe Get Substring
Question:
I have some dataframes where its id colum like
A12-B-56
E1234B115
It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before ‘-B-‘ and ‘B’. One way that I came up with is using a for loop and re.split(‘(d+)’, some_text). Is there a faster way to do this?
Answers:
Use a lookahead assertion to get all the alphanumerics from start that are followed by B
. Would be wise to do this before you replace -
. code below:
df=pd.DataFrame({'column':['A12-B-56','A123B567']})
df= df.assign(column=(df['column'].str.replace('-','', regex=True).str.extract('(^w+(?=B))')))
As proposed by @mozway make it a one liner short and concise
df['column'].str.extract('(^w+)-?B')
I have some dataframes where its id colum like
A12-B-56
E1234B115
It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before ‘-B-‘ and ‘B’. One way that I came up with is using a for loop and re.split(‘(d+)’, some_text). Is there a faster way to do this?
Use a lookahead assertion to get all the alphanumerics from start that are followed by B
. Would be wise to do this before you replace -
. code below:
df=pd.DataFrame({'column':['A12-B-56','A123B567']})
df= df.assign(column=(df['column'].str.replace('-','', regex=True).str.extract('(^w+(?=B))')))
As proposed by @mozway make it a one liner short and concise
df['column'].str.extract('(^w+)-?B')