How do you map a dictionary to an existing pandas dataframe column?

Question:

I would like to take the dictionary and use that to fill in missing values in a dataframe column.
So the dictionary keys correspond to the index in the dataframe or a different column in the data frame and the values in the dictionary correspond to the value I would like to update into the dataframe. Here’s a more visual example.

    key_col  target_col
0       w      a
1       c      NaN
2       z    NaN

Dictionary I’d like to map into the dataframe

dict = {'c':'B','z':'4'}

I’d like the dataframe to look like

  key_col  target_col
0       w      a
1       c      B
2       z      4

Now I’ve tried a few different things. setting the index to key_col and then trying

df[target_col].map(dict)

df.loc[target_col] = df['key_col'].map(dict)

I know replace doesn’t work because it requires I set a criteria on the values that need to be replaced. I would just like to update the value if the key_col/index has a data value.

Asked By: backtoback4444

||

Answers:

I’m not sure it’s the best way to do it, but considering that you have a few samples, should not be a problem doing this:

x = x.set_index('key_col')
for k in dict.keys():
    x.loc[k] = dict[k] 
x.reset_index() # back to the original df
Answered By: Adelson Araújo

You can use apply with a lambda function.

The example dataframe.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"key_col": {0: "w", 1: "c", 2: "z"}, "target_col": {0: "a", 1: np.nan, 2: np.nan}}
)

I renamed the dictionary as you should not use the name dict because it is a built-in object in Python.

map_dict = {"c": "B", "z": "4"}

The use of apply and the lambda function.

df.loc[:, "target_col"] = df.apply(
    lambda x: map_dict.get(x["key_col"], x["target_col"]), axis=1
)

map_dict.get() allows you to define a default value so we can use it to return the default target_col value for those rows which are not in the map.

Answered By: nocibambi

an alternative : changed name from dict to dicts to avoid confusion with the built-in type

df.set_index('key_col').T.fillna(dicts).T

           target_col
key_col 
   w         a
   c         B
   z         4
Answered By: sammywemmy
dict = {'c':'B','z':'4'}

#mask those that are not NaN in `target_col`
m=df.target_col.isna()
df.loc[m,'target_col']=df.key_col.map(dict)

enter image description here

Answered By: wwnde

Approach #1 (key_col as an additional column):

import numpy as np
import pandas as pd

#initial dataframe
df = pd.DataFrame(data={'key_col': ['w', 'c', 'z'], 'target_col': ['a', np.NaN, np.NaN]})
#dictionary/dict values to update - key value corresponds to key_col, value to target_col
update_dict = {'c':'B','z':'4'}

for key in update_dict.keys():
#df[df['key_col'] == key]['target_col'] = update_dict[] <-- Do NOT do this
df.loc[df['key_col']==key, 'target_col'] = update_dict[key]

This approach iterates through each key to be updated – checks if there is any location in the dataframe (df) where the key-to-be-updated (update_dict.keys() – key) exists. If a match exists, then the value in the target_col will be set to the updated value within the dictionary.


Approach #2 (key_col as Index)

df = pd.DataFrame(data=['a', np.NaN, np.NaN], columns=['target_col'], index=['w', 'c', 'z'])
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
df.loc[key, 'target_col'] = update_dict[key]

This approach is pretty self explanatory. Ensure that adequate error handling is provided in the event that the updated_dict contains a key that does not exist in the DataFrame,
df.loc[key, 'target_col'] will raise an exception.


Note: DataFrame().loc allows us to reference particular coordinates on the DataFrame using column labels, whereas .iloc uses integer based index labels.

Answered By: RamanSB

You can use update, which modifies inplace, so no need to assign the changes back. Since pandas aligns on both index and column labels we’ll need to rename the mapped Series so it updates 'target_col'. (Rename your dict something else, like d).

df.update(df['key_col'].map(d).rename('target_col'))

print(df)
#  key_col target_col
#0       w          a
#1       c          B
#2       z          4
Answered By: ALollz

As the column that has the NaN is target_col, and the dictionary dict keys correspond to the column key_col, one can use pandas.Series.map and pandas.Series.fillna as follows

df['target_col'] = df['key_col'].map(dict).fillna(df['target_col'])

[Out]:

  key_col target_col
0       w          a
1       c          B
2       z          4
Answered By: Gonçalo Peres
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.