Python Pandas Extract text between a word and a symbol
Question:
I am trying to extract text between a word and a symbol.
Here is the input table.
And my expected output is like this.
I do not want to have the word ‘Team:’ and ‘<>’ in the output.
I tried something like this but it keeps the ‘Team:’ and ‘<>’ in the output: data[new col]=data[‘Team’].str.extract(r'(Team:s[a-zA-Zs]+<>)
Thank you.
Answers:
Use regex captured group for str.extract
method:
df['Team'].str.extract(r'^Team: ([^<>]+)')
[^<>]+
– matches any character except <
and >
chars
You can do this with a regular expression as this would account for countries with spaces and any N length.
import re
row_string = "Team: United States <>"
country_name = re.search(r'Team: (.*) <>', row_string).group(1)
The reason is because you have the capture group around the whole match, which will be returned by str.extract
You could write it using the group only around the part that you want to keep:
df['Team'].str.extract(r'Team:s([a-zA-Zs]+)<>')
See the capture group values at this regex101 demo.
I am trying to extract text between a word and a symbol.
Here is the input table.
And my expected output is like this.
I do not want to have the word ‘Team:’ and ‘<>’ in the output.
I tried something like this but it keeps the ‘Team:’ and ‘<>’ in the output: data[new col]=data[‘Team’].str.extract(r'(Team:s[a-zA-Zs]+<>)
Thank you.
Use regex captured group for str.extract
method:
df['Team'].str.extract(r'^Team: ([^<>]+)')
[^<>]+
– matches any character except<
and>
chars
You can do this with a regular expression as this would account for countries with spaces and any N length.
import re
row_string = "Team: United States <>"
country_name = re.search(r'Team: (.*) <>', row_string).group(1)
The reason is because you have the capture group around the whole match, which will be returned by str.extract
You could write it using the group only around the part that you want to keep:
df['Team'].str.extract(r'Team:s([a-zA-Zs]+)<>')
See the capture group values at this regex101 demo.