Pandas how can I extract by regex from column into multiple rows?

Question:

I have the following data:

ID content date
1 2429(sach:MySpezialItem :16.59) 2022-04-12
2 2429(sach:item 13 :18.59)(sach:this and that costs:16.59) 2022-06-12

And I want to achieve the following:

ID number price date
1 2429 2022-04-12
1 16.59 2022-04-12
2 2429 2022-06-12
2 18.59 2022-06-12
2 16.59 2022-06-12

What I tried

df['sach'] = df['content'].str.split(r'(sach:.*)').explode('content')
df['content'] = df['content'].str.replace(r'(sach:.*)','', regex=True)
Asked By: Schwenk

||

Answers:

You can use a single regex with str.extractall:

regex = r'(?P<number>d+)(|:(?P<price>d+(?:.d+)?))'

df = df.join(df.pop('content').str.extractall(regex).droplevel(1))

NB. if you want a new DataFrame, don’t pop:

df2 = (df.drop(columns='content')
         .join(df['content'].str.extractall(regex).droplevel(1))
       )

output:

   ID        date number  price
0   1  2022-04-12   2429    NaN
0   1  2022-04-12    NaN  16.59
1   2  2022-06-12   2429    NaN
1   2  2022-06-12    NaN  18.59
1   2  2022-06-12    NaN  16.59

regex demo

Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.