Replace value in mutiple columns based on dictionary map

Question:

I have a dataframe that looks similar to –

df = DataFrame(data={'ID': ['a','b','c','d'], 'col1':[1,2,3,4], 'col2':[5,6,7,8], 'col3':[9,10,11,12]})

I have a dictionary like this

mapper = {'a':100,'d':3}

Where the key in the dictionary matches the ID in the dataframe, I want to be able to replace the values in say col1 and col3 with the value in the dictionary.
Currently I can do this as such

for id, val in mapper.items():
   df.loc[df['ID']==id, 'col1']=val
   df.loc[df['ID']==id, 'col3']=val

But I’m wondering if there is a vectorised way to do this outside of a for loop as my dataframe is large.

Asked By: thefrollickingnerd

||

Answers:

You can use np.where to do this.

import numpy as np

df["col1"] = np.where(df["ID"].isin(mapper.keys()), df["ID"].map(mapper), df["col1"])
df["col3"] = np.where(df["ID"].isin(mapper.keys()), df["ID"].map(mapper), df["col3"])

np.where takes condition as first argument, the second argument tells what value to broadcast if True and third argument tells what value to broadcast if false. If you look at the output of the arguments separately you can understand how it works.

df['ID'].isin(mapper.keys())  # argument 1

# returns
0     True
1    False
2    False
3     True
Name: ID, dtype: bool
df["ID"].map(mapper)  # argument 2

# returns
0    100.0
1      NaN
2      NaN
3      3.0
Name: ID, dtype: float64
df["col1"]  # argument 3

# returns
0    100.0
1      2.0
2      3.0
3      3.0
Name: col1, dtype: float64
Answered By: Ashyam

map the values from dict then update in the corresponding cols

s = df['ID'].map(mapper)
df['col1'].update(s), df['col2'].update(s)

Result

  ID  col1  col2  col3
0  a   100   100     9
1  b     2     6    10
2  c     3     7    11
3  d     3     3    12
Answered By: Shubham Sharma

With map and assign :

df = df.assign(**{col: df["ID"].map(mapper)
                              .fillna(df[col])
                              .astype(int)
                  for col in ["col1", "col3"]})

​Output :

print(df)

  ID  col1  col2  col3
0  a   100     5   100
1  b     2     6    10
2  c     3     7    11
3  d     3     8     3
Answered By: Timeless