Python Dataframe Get Substring

Question

I have some dataframes where its id colum like

A12-B-56
E1234B115

It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before ‘-B-‘ and ‘B’. One way that I came up with is using a for loop and re.split(‘(d+)’, some_text). Is there a faster way to do this?

Asked By: user398843

||

Source

Answer 1

Use a lookahead assertion to get all the alphanumerics from start that are followed by B. Would be wise to do this before you replace -. code below:

df=pd.DataFrame({'column':['A12-B-56','A123B567']})

df= df.assign(column=(df['column'].str.replace('-','', regex=True).str.extract('(^w+(?=B))')))

As proposed by @mozway make it a one liner short and concise

df['column'].str.extract('(^w+)-?B')

Answered By: wwnde

Python Dataframe Get Substring

Question:

Answers: