Pandas Dataframe: Get and Edit all values in a column containing substring

Question:

Lets say I have a dataframe, called stores, like this one:

country store_name
FR my new tmp
ES this Tmp is new
FR walmart
ES Target
FR TMP

and another dataframe, called replacements, like this one:

country original replacement
ES TMP STORE
FR TMP STORE
FR WALMART IGNORE

How would you go about getting and updating all values in the store_name column of the first dataframe according to the "rules" of the second one, when the substring in the original column is found (ignoring lower/upper case)?

For this example i’d like to get a new dataframe like this:

country store_name
FR my new STORE
ES this STORE is new
FR IGNORE
ES Target
FR STORE

I was thinking something like iterating the second dataframe and apply the change to the first one, like this:

for index, row in replacements.iterrows():
    stores['store_name'] = stores['store_name'].str.upper().replace(row["original"].upper(), row["replacement"])

It kind of works, but it’s doing some weird things like not changing some strings. Also, I’m not sure if this is the optimal way of doing this. Any suggestions?

Reproducible inputs:

data = [['FR', 'my new tmp'], ['ES', 'this Tmp is new'], ['FR', 'walmart'], ['ES', 'Target'], ['FR', 'TMP']]
df1 = pd.DataFrame(data, columns=['country', 'store_name'])

data = [['ES', 'TMP','STORE'], ['FR', 'TMP','STORE'], ['FR', 'WALMART','IGNORE']]
df2 = pd.DataFrame(data, columns=['country', 'store_name','replacement'])
Asked By: Alain

||

Answers:

Assuming df1 and df2, you can use a crafted regex within groupby.apply:

import re

s = df2.set_index(['country', 'store_name'])['replacement']

df1['store_name'] = (
 df1.groupby('country', group_keys=False)
    .apply(lambda g: g['store_name'].str.replace(f"({'|'.join(map(re.escape, s[g.name].index))})", lambda m: s[(g.name, m.group().upper())], regex=True, flags=re.I))
 )

print(df1)

Output:

  country         store_name
0      FR       my new STORE
1      ES  this STORE is new
2      FR             IGNORE
3      ES             Target
4      FR              STORE
Answered By: mozway

If obtaining a new dataframe as result is acceptable consider the following approach implying outer join of 2 initial dfs, grouping and regex replacement based on first found match within a group and successful replacement:

import re

def f(x):
    for r in x.itertuples(index=False):
        store_name, subs = re.subn(r.store_name_y, r.replacement, r.store_name_x, flags=re.I)
        if subs == 1:  # if there was successful replacement
            return store_name  # return the result immediately
    else:
        return r.store_name_x

res_df = df1.merge(df2, on='country', how='outer')
    .groupby(['country', 'store_name_x'], sort=False)
    .apply(f).droplevel(1).reset_index(name='store_name')

  country         store_name
0      FR       my new STORE
1      FR             IGNORE
2      FR              STORE
3      ES  this STORE is new
4      ES             Target
Answered By: RomanPerekhrest
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.