pandas remove all words before a specific word and get the first n words after that specific word
Question:
I have a dataframe like this:
df=pd.DataFrame({'caption':'hello this pack is for you: Jake Peralta. Thanks'})
df
caption
hello this pack is for you: Jake Peralta. Thanks
...
...
...
I’m trying to get the recipient’s first and last name here. The format of the caption column is always the same. So delete everything before for you: and get the first 2(this number may change) words after for you:
Answers:
here is one way :
df.caption.apply(lambda st: st[st.find(":")+2:st.find(".")])
output :
0 Jake Peralta
Name: caption, dtype: object
May be you can try like this
df['caption'].str.split("for you: ").str[1].str.split('.').str[0]
output:
0 Jake Peralta
1 first last
Takes care of leading spaces in name:
>>> df.caption.str.split(".").str[0].str.split(":").str[1].str.strip()
1 Jake Peralta
Name: caption, dtype: object
I have a dataframe like this:
df=pd.DataFrame({'caption':'hello this pack is for you: Jake Peralta. Thanks'})
df
caption
hello this pack is for you: Jake Peralta. Thanks
...
...
...
I’m trying to get the recipient’s first and last name here. The format of the caption column is always the same. So delete everything before for you: and get the first 2(this number may change) words after for you:
here is one way :
df.caption.apply(lambda st: st[st.find(":")+2:st.find(".")])
output :
0 Jake Peralta
Name: caption, dtype: object
May be you can try like this
df['caption'].str.split("for you: ").str[1].str.split('.').str[0]
output:
0 Jake Peralta
1 first last
Takes care of leading spaces in name:
>>> df.caption.str.split(".").str[0].str.split(":").str[1].str.strip()
1 Jake Peralta
Name: caption, dtype: object