Python Dataframe Get Substring

Question:

I have some dataframes where its id colum like

A12-B-56
E1234B115

It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before ‘-B-‘ and ‘B’. One way that I came up with is using a for loop and re.split(‘(d+)’, some_text). Is there a faster way to do this?

Asked By: user398843

||

Answers:

Use a lookahead assertion to get all the alphanumerics from start that are followed by B. Would be wise to do this before you replace -. code below:

df=pd.DataFrame({'column':['A12-B-56','A123B567']})

df= df.assign(column=(df['column'].str.replace('-','', regex=True).str.extract('(^w+(?=B))')))

As proposed by @mozway make it a one liner short and concise

df['column'].str.extract('(^w+)-?B')
Answered By: wwnde
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.