Pandas Dataframe: Get and Edit all values in a column containing substring

Question

Lets say I have a dataframe, called stores, like this one:

country	store_name
FR	my new tmp
ES	this Tmp is new
FR	walmart
ES	Target
FR	TMP

and another dataframe, called replacements, like this one:

country	original	replacement
ES	TMP	STORE
FR	TMP	STORE
FR	WALMART	IGNORE

How would you go about getting and updating all values in the store_name column of the first dataframe according to the "rules" of the second one, when the substring in the original column is found (ignoring lower/upper case)?

For this example i’d like to get a new dataframe like this:

country	store_name
FR	my new STORE
ES	this STORE is new
FR	IGNORE
ES	Target
FR	STORE

I was thinking something like iterating the second dataframe and apply the change to the first one, like this:

for index, row in replacements.iterrows():
    stores['store_name'] = stores['store_name'].str.upper().replace(row["original"].upper(), row["replacement"])

It kind of works, but it’s doing some weird things like not changing some strings. Also, I’m not sure if this is the optimal way of doing this. Any suggestions?

Reproducible inputs:

data = [['FR', 'my new tmp'], ['ES', 'this Tmp is new'], ['FR', 'walmart'], ['ES', 'Target'], ['FR', 'TMP']]
df1 = pd.DataFrame(data, columns=['country', 'store_name'])

data = [['ES', 'TMP','STORE'], ['FR', 'TMP','STORE'], ['FR', 'WALMART','IGNORE']]
df2 = pd.DataFrame(data, columns=['country', 'store_name','replacement'])

Asked By: Alain

||

Source

Answer 1

Assuming df1 and df2, you can use a crafted regex within groupby.apply:

import re

s = df2.set_index(['country', 'store_name'])['replacement']

df1['store_name'] = (
 df1.groupby('country', group_keys=False)
    .apply(lambda g: g['store_name'].str.replace(f"({'|'.join(map(re.escape, s[g.name].index))})", lambda m: s[(g.name, m.group().upper())], regex=True, flags=re.I))
 )

print(df1)

Output:

  country         store_name
0      FR       my new STORE
1      ES  this STORE is new
2      FR             IGNORE
3      ES             Target
4      FR              STORE

Answered By: mozway

Answer 2

If obtaining a new dataframe as result is acceptable consider the following approach implying outer join of 2 initial dfs, grouping and regex replacement based on first found match within a group and successful replacement:

import re

def f(x):
    for r in x.itertuples(index=False):
        store_name, subs = re.subn(r.store_name_y, r.replacement, r.store_name_x, flags=re.I)
        if subs == 1:  # if there was successful replacement
            return store_name  # return the result immediately
    else:
        return r.store_name_x

res_df = df1.merge(df2, on='country', how='outer')
    .groupby(['country', 'store_name_x'], sort=False)
    .apply(f).droplevel(1).reset_index(name='store_name')

  country         store_name
0      FR       my new STORE
1      FR             IGNORE
2      FR              STORE
3      ES  this STORE is new
4      ES             Target

Answered By: RomanPerekhrest

Pandas Dataframe: Get and Edit all values in a column containing substring

Question:

Answers: