Replace all variations of a string regardless of comma position Python

Question:

I have a data frame that consists of multiple rows that contain different variations of a string that is separated by commas. Rather than constantly writing variations of this code such as df.replace('Word,', ''), I am looking for a simpler way to replace variations in strings for python. I have heard about regex yet am having a difficult time understanding it.

One such example that I am looking into is df.column.str.replace('Word,?', '') which would replace all variations of "Word" regardless of comma position. However, I am unsure as to how this works. Any help in understanding replacing using regex would be greatly appreciated. Thank you in advance.

Example:

'Word, foo, bar'         
'Word'   
'foo, bar, Word'  
'foo, Word, bar'

Desired Output:

'foo, bar'   
''        
'foo, bar'           
'foo, bar'
Asked By: Panic_Picnic

||

Answers:

You can do it as below
Input

df = pd.DataFrame([[1, 'Word, foo, bar'],
                   [2, 'Word'],
                   [3, 'foo, bar, Word'],
                   [4, 'foo, Word, bar']],columns=['id', 'text'])

id  text
1   Word, foo, bar
2   Word
3   foo, bar, Word
4   foo, Word, bar

Code to replace text ‘Word’ and following comma & space if any

df['text']=df['text'].replace('Word(,s)|(,s)?Word','',regex=True)

What is happening in the code

Word : will search for the text ‘Word’

(,s)? : will look for comma, followed by spaces, ? will look and match if it is available, if comma & space does not follow, then just the text ‘Word’ is matched. So ? is pretty important here.

| : this matches one of the 2 expressions (in your case this is needed for line 3 where there is a preceding space & comma)

You can see detailed explanation here Regex Demo

Output

id  text
1   foo, bar
2   
3   foo, bar
4   foo, bar
Answered By: moys
df.replace(to_replace='Word,|(, )?Word',value='',regex=True)

This way .replace() method will do the required work.

to_replace is our regular expression criteria and it should be in string.
'Word,' will match all strings except at the end in form of ", Word".

To match those end string we provided "|"(or) so that we can add new criteria which is "(, )?Word". Here ? match 0 or 1 occurrence of ", "(comma and 1 space) so that both conditions for ending string as well as only 1 string "Word" matched

Value = '' : which show what to be replaced with

regex = True : which tells to treat "to_replace" parameter as a regex expression

Answered By: Gaurang patel
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.