Extract a match in a row and take everything until a comma or take it if it is the end ot the string in pandas

Question:

I have a dataset. In the column ‘Tags’ I want to extract from each row all the content that has the word player. I could repeat or be alone in the same cell. Something like this:

‘view_snapshot_hi:hab,like_hi:hab,view_snapshot_foinbra,completed_profile,view_page_investors_landing,view_foinbra_inv_step1,view_foinbra_inv_step2,view_foinbra_inv_step3,view_snapshot_acium,player,view_acium_inv_step1,view_acium_inv_step2,view_acium_inv_step3,player_acium-ronda-2_r1,view_foinbra_rinv_step1,view_page_makers_landing’

expected output:
‘player,player_acium-ronda-2_r1’

And I need both.

df["Tags"] = df["Tags"].str.ectract(r'*player'*,?s*')

I tried this but it´s not working.

Answers:

You need to use Series.str.extract keeping in mind that the pattern should contain a capturing group embracing the part you need to extract.

The pattern you need is player[^,]*:

df["Tags"] = df["Tags"].str.extract(r'(player[^,]*)', expand=False)

The expand=False returns a Series/Index rather than a dataframe.

Note that Series.str.extract finds and fetches the first match only. To get all matches use either of the two solutions below with Series.str.findall:

df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False)
df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False).str.join(", ")
Answered By: Wiktor Stribiżew

This simple list also gives what you want

words_with_players = [item for item in your_str.split(',') if 'player' in item]
players = ','.join(words_with_players)
Answered By: Nuri Taş
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.