Python Pandas Extract text between a word and a symbol

Question:

I am trying to extract text between a word and a symbol.

Here is the input table.

enter image description here

And my expected output is like this.

enter image description here

I do not want to have the word ‘Team:’ and ‘<>’ in the output.

I tried something like this but it keeps the ‘Team:’ and ‘<>’ in the output: data[new col]=data[‘Team’].str.extract(r'(Team:s[a-zA-Zs]+<>)

Thank you.

Asked By: user13617491

||

Answers:

Use regex captured group for str.extract method:

df['Team'].str.extract(r'^Team: ([^<>]+)')

  • [^<>]+ – matches any character except < and > chars
Answered By: RomanPerekhrest

You can do this with a regular expression as this would account for countries with spaces and any N length.

import re

row_string = "Team: United States <>"
country_name = re.search(r'Team: (.*) <>', row_string).group(1)
Answered By: iohans

The reason is because you have the capture group around the whole match, which will be returned by str.extract

You could write it using the group only around the part that you want to keep:

df['Team'].str.extract(r'Team:s([a-zA-Zs]+)<>')

See the capture group values at this regex101 demo.

Answered By: The fourth bird