How do you map a dictionary to an existing pandas dataframe column?
Question:
I would like to take the dictionary and use that to fill in missing values in a dataframe column.
So the dictionary keys correspond to the index in the dataframe or a different column in the data frame and the values in the dictionary correspond to the value I would like to update into the dataframe. Here’s a more visual example.
key_col target_col
0 w a
1 c NaN
2 z NaN
Dictionary I’d like to map into the dataframe
dict = {'c':'B','z':'4'}
I’d like the dataframe to look like
key_col target_col
0 w a
1 c B
2 z 4
Now I’ve tried a few different things. setting the index to key_col and then trying
df[target_col].map(dict)
df.loc[target_col] = df['key_col'].map(dict)
I know replace doesn’t work because it requires I set a criteria on the values that need to be replaced. I would just like to update the value if the key_col/index has a data value.
Answers:
I’m not sure it’s the best way to do it, but considering that you have a few samples, should not be a problem doing this:
x = x.set_index('key_col')
for k in dict.keys():
x.loc[k] = dict[k]
x.reset_index() # back to the original df
You can use apply with a lambda function.
The example dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"key_col": {0: "w", 1: "c", 2: "z"}, "target_col": {0: "a", 1: np.nan, 2: np.nan}}
)
I renamed the dictionary as you should not use the name dict
because it is a built-in object in Python.
map_dict = {"c": "B", "z": "4"}
The use of apply
and the lambda function.
df.loc[:, "target_col"] = df.apply(
lambda x: map_dict.get(x["key_col"], x["target_col"]), axis=1
)
map_dict.get()
allows you to define a default value so we can use it to return the default target_col
value for those rows which are not in the map.
an alternative : changed name from dict to dicts to avoid confusion with the built-in type
df.set_index('key_col').T.fillna(dicts).T
target_col
key_col
w a
c B
z 4
Approach #1 (key_col as an additional column):
import numpy as np
import pandas as pd
#initial dataframe
df = pd.DataFrame(data={'key_col': ['w', 'c', 'z'], 'target_col': ['a', np.NaN, np.NaN]})
#dictionary/dict values to update - key value corresponds to key_col, value to target_col
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
#df[df['key_col'] == key]['target_col'] = update_dict[] <-- Do NOT do this
df.loc[df['key_col']==key, 'target_col'] = update_dict[key]
This approach iterates through each key to be updated – checks if there is any location in the dataframe (df) where the key-to-be-updated (update_dict.keys() – key) exists. If a match exists, then the value in the target_col will be set to the updated value within the dictionary.
Approach #2 (key_col as Index)
df = pd.DataFrame(data=['a', np.NaN, np.NaN], columns=['target_col'], index=['w', 'c', 'z'])
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
df.loc[key, 'target_col'] = update_dict[key]
This approach is pretty self explanatory. Ensure that adequate error handling is provided in the event that the updated_dict contains a key that does not exist in the DataFrame,
df.loc[key, 'target_col']
will raise an exception.
Note: DataFrame().loc allows us to reference particular coordinates on the DataFrame using column labels, whereas .iloc uses integer based index labels.
You can use update
, which modifies inplace, so no need to assign the changes back. Since pandas aligns on both index and column labels we’ll need to rename the mapped Series so it updates 'target_col'
. (Rename your dict something else, like d
).
df.update(df['key_col'].map(d).rename('target_col'))
print(df)
# key_col target_col
#0 w a
#1 c B
#2 z 4
As the column that has the NaN
is target_col
, and the dictionary dict
keys correspond to the column key_col
, one can use pandas.Series.map
and pandas.Series.fillna
as follows
df['target_col'] = df['key_col'].map(dict).fillna(df['target_col'])
[Out]:
key_col target_col
0 w a
1 c B
2 z 4
I would like to take the dictionary and use that to fill in missing values in a dataframe column.
So the dictionary keys correspond to the index in the dataframe or a different column in the data frame and the values in the dictionary correspond to the value I would like to update into the dataframe. Here’s a more visual example.
key_col target_col
0 w a
1 c NaN
2 z NaN
Dictionary I’d like to map into the dataframe
dict = {'c':'B','z':'4'}
I’d like the dataframe to look like
key_col target_col
0 w a
1 c B
2 z 4
Now I’ve tried a few different things. setting the index to key_col and then trying
df[target_col].map(dict)
df.loc[target_col] = df['key_col'].map(dict)
I know replace doesn’t work because it requires I set a criteria on the values that need to be replaced. I would just like to update the value if the key_col/index has a data value.
I’m not sure it’s the best way to do it, but considering that you have a few samples, should not be a problem doing this:
x = x.set_index('key_col')
for k in dict.keys():
x.loc[k] = dict[k]
x.reset_index() # back to the original df
You can use apply with a lambda function.
The example dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"key_col": {0: "w", 1: "c", 2: "z"}, "target_col": {0: "a", 1: np.nan, 2: np.nan}}
)
I renamed the dictionary as you should not use the name dict
because it is a built-in object in Python.
map_dict = {"c": "B", "z": "4"}
The use of apply
and the lambda function.
df.loc[:, "target_col"] = df.apply(
lambda x: map_dict.get(x["key_col"], x["target_col"]), axis=1
)
map_dict.get()
allows you to define a default value so we can use it to return the default target_col
value for those rows which are not in the map.
an alternative : changed name from dict to dicts to avoid confusion with the built-in type
df.set_index('key_col').T.fillna(dicts).T
target_col
key_col
w a
c B
z 4
Approach #1 (key_col as an additional column):
import numpy as np
import pandas as pd
#initial dataframe
df = pd.DataFrame(data={'key_col': ['w', 'c', 'z'], 'target_col': ['a', np.NaN, np.NaN]})
#dictionary/dict values to update - key value corresponds to key_col, value to target_col
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
#df[df['key_col'] == key]['target_col'] = update_dict[] <-- Do NOT do this
df.loc[df['key_col']==key, 'target_col'] = update_dict[key]
This approach iterates through each key to be updated – checks if there is any location in the dataframe (df) where the key-to-be-updated (update_dict.keys() – key) exists. If a match exists, then the value in the target_col will be set to the updated value within the dictionary.
Approach #2 (key_col as Index)
df = pd.DataFrame(data=['a', np.NaN, np.NaN], columns=['target_col'], index=['w', 'c', 'z'])
update_dict = {'c':'B','z':'4'}
for key in update_dict.keys():
df.loc[key, 'target_col'] = update_dict[key]
This approach is pretty self explanatory. Ensure that adequate error handling is provided in the event that the updated_dict contains a key that does not exist in the DataFrame,
df.loc[key, 'target_col']
will raise an exception.
Note: DataFrame().loc allows us to reference particular coordinates on the DataFrame using column labels, whereas .iloc uses integer based index labels.
You can use update
, which modifies inplace, so no need to assign the changes back. Since pandas aligns on both index and column labels we’ll need to rename the mapped Series so it updates 'target_col'
. (Rename your dict something else, like d
).
df.update(df['key_col'].map(d).rename('target_col'))
print(df)
# key_col target_col
#0 w a
#1 c B
#2 z 4
As the column that has the NaN
is target_col
, and the dictionary dict
keys correspond to the column key_col
, one can use pandas.Series.map
and pandas.Series.fillna
as follows
df['target_col'] = df['key_col'].map(dict).fillna(df['target_col'])
[Out]:
key_col target_col
0 w a
1 c B
2 z 4