Python Pandas – What is Pandas version of replace or append while working with multiple values?

Question:

I have a dataframe where I am creating a new data frame using .str.contains. This is working okay, however I am then trying to find data and then add ‘NEW’ to the front however the way I am doing it is creating ‘NEWX|Y|Z" when I want ‘NEWX’ where it finds X and ‘NEWY’ where it finds Y etc.

substr = ['X', 'Y', 'Z']
df1 = df[df['short_description'].str.contains('|'.join(substr), na=False)]

Finds the correct rows

But when I try to append NEW to the front

df3['short_description'] = df3['short_description'].str.replace('|'.join(substr),'NEW'+'_'+'|'.join(substr))

it adds ‘NEWX|Y|Z’ in front of each one it finds. I understand why it is doing that but I just want NEWX to replace X, NEWY to replace Y etc.

How can I make it only replace like for like?

X / Y and Z aren’t always at start of column so that’s why I am finding and replacing and not just adding ‘NEW’ to whole dataframe column

Thanks for any help.

Asked By: Demo

||

Answers:

IIUC, use capture group:

s = pd.Series(["Xsomething", "Ythatthing", "WhatZ", "Nothing"])
s.str.replace("(%s)" % "|".join("XYZ"), "NEW\1", regex=True)

Output:

0    NEWXsomething
1    NEWYthatthing
2         WhatNEWZ
3          Nothing
dtype: object

Capture group works like use what you found.

  1. "(X|Y|Z)" is looking for either of X, Y or Z, and keep it as a first captured group (effect of wrapping with ().)

    • Note that if you use multiple (), you can use \1,\2,…\n.
  2. Then NEW\1 uses this capture group to replace into NEW{capture_group_1}.

Answered By: Chris
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.