Extract a match in a row and take everything until a comma or take it if it is the end ot the string in pandas
Question:
I have a dataset. In the column ‘Tags’ I want to extract from each row all the content that has the word player. I could repeat or be alone in the same cell. Something like this:
‘view_snapshot_hi:hab,like_hi:hab,view_snapshot_foinbra,completed_profile,view_page_investors_landing,view_foinbra_inv_step1,view_foinbra_inv_step2,view_foinbra_inv_step3,view_snapshot_acium,player,view_acium_inv_step1,view_acium_inv_step2,view_acium_inv_step3,player_acium-ronda-2_r1,view_foinbra_rinv_step1,view_page_makers_landing’
expected output:
‘player,player_acium-ronda-2_r1’
And I need both.
df["Tags"] = df["Tags"].str.ectract(r'*player'*,?s*')
I tried this but it´s not working.
Answers:
You need to use Series.str.extract
keeping in mind that the pattern should contain a capturing group embracing the part you need to extract.
The pattern you need is player[^,]*
:
df["Tags"] = df["Tags"].str.extract(r'(player[^,]*)', expand=False)
The expand=False
returns a Series/Index rather than a dataframe.
Note that Series.str.extract
finds and fetches the first match only. To get all matches use either of the two solutions below with Series.str.findall
:
df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False)
df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False).str.join(", ")
This simple list also gives what you want
words_with_players = [item for item in your_str.split(',') if 'player' in item]
players = ','.join(words_with_players)
I have a dataset. In the column ‘Tags’ I want to extract from each row all the content that has the word player. I could repeat or be alone in the same cell. Something like this:
‘view_snapshot_hi:hab,like_hi:hab,view_snapshot_foinbra,completed_profile,view_page_investors_landing,view_foinbra_inv_step1,view_foinbra_inv_step2,view_foinbra_inv_step3,view_snapshot_acium,player,view_acium_inv_step1,view_acium_inv_step2,view_acium_inv_step3,player_acium-ronda-2_r1,view_foinbra_rinv_step1,view_page_makers_landing’
expected output:
‘player,player_acium-ronda-2_r1’
And I need both.
df["Tags"] = df["Tags"].str.ectract(r'*player'*,?s*')
I tried this but it´s not working.
You need to use Series.str.extract
keeping in mind that the pattern should contain a capturing group embracing the part you need to extract.
The pattern you need is player[^,]*
:
df["Tags"] = df["Tags"].str.extract(r'(player[^,]*)', expand=False)
The expand=False
returns a Series/Index rather than a dataframe.
Note that Series.str.extract
finds and fetches the first match only. To get all matches use either of the two solutions below with Series.str.findall
:
df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False)
df["Tags"] = df["Tags"].str.findall(r'player[^,]*', expand=False).str.join(", ")
This simple list also gives what you want
words_with_players = [item for item in your_str.split(',') if 'player' in item]
players = ','.join(words_with_players)