How do I search a nested Python dictionary for a matching dataframe value and then update the dataframe?
Question:
I have a Python dictionary with company information that is structured like this:
co_dict = {'0': {'co_name': 'A',
'company_type': 'Public',
'global_name': 'A PARENT',
'sales': '1000'},
'1': {'co_name': 'B',
'company_type': 'Public',
'global_name': 'B PARENT',
'sales': '1000'}}
And a Pandas dataframe that looks like this (the real df is much longer):
df = pd.DataFrame({'co-name': ['M','A','B','F'], 'co-number': [1,2,3,4]})
co-name co-number
0 M 1
1 A 2
2 B 3
3 F 4
I would like to find matches for each df["co-name"] in the dictionary and append the corresponding dictionary values for "global_name" and "sales" to a new column in the df. When there is no match, the columns should both read "n/a". So the end result would look like this:
co-name co-number global_name sales
0 M 1 n/a n/a
1 A 2 A PARENT 1000
2 B 3 B PARENT 2000
3 F 4 n/a n/a
I tried to do this as follows:
def find_global_name(x):
for key1 in co_dict.keys():
if (x['co-name'] == co_dict[key1]['co_name']):
return co_dict[key1]['global_name']
else:
return "n/a"
df['global_name'] = df.apply(find_global_name, axis=1)
But the for loop seems to stop after the first company "A" match. I’m stuck trying to figuring out why this. Why is this method not working? How do I accomplish this task? Many, many thanks for your help.
Answers:
Create a dataframe from the dictionary and merge with df
:
(df.merge(pd.DataFrame(co_dict).T,
left_on = 'co-name',
right_on = 'co_name',
how = 'left')
.drop(columns=['co_name', 'company_type'])
)
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN
Your code can be much clearer and faster if you trim the dictionary to only the values you need, and combine with map
which is quite fast … this also allows you to avoid writing code that is probably not necessary.
Create dictionaries for the values you need:
global_name = {val['co_name']: val['global_name']
for _, val in co_dict.items()}
sales = {val['co_name']: val['sales']
for _, val in co_dict.items()}
global_name
{'A': 'A PARENT', 'B': 'B PARENT'}
sales
{'A': '1000', 'B': '1000'}
Map the dictionaries to the dataframe to create the new columns (the assumption here is that values in the column are unique; if not, this won’t work and merge is a better option, as it can handle duplicates):
df.assign(global_name = df['co-name'].map(global_name),
sales = df['co-name'].map(sales))
Out[722]:
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN
I have a Python dictionary with company information that is structured like this:
co_dict = {'0': {'co_name': 'A',
'company_type': 'Public',
'global_name': 'A PARENT',
'sales': '1000'},
'1': {'co_name': 'B',
'company_type': 'Public',
'global_name': 'B PARENT',
'sales': '1000'}}
And a Pandas dataframe that looks like this (the real df is much longer):
df = pd.DataFrame({'co-name': ['M','A','B','F'], 'co-number': [1,2,3,4]})
co-name co-number
0 M 1
1 A 2
2 B 3
3 F 4
I would like to find matches for each df["co-name"] in the dictionary and append the corresponding dictionary values for "global_name" and "sales" to a new column in the df. When there is no match, the columns should both read "n/a". So the end result would look like this:
co-name co-number global_name sales
0 M 1 n/a n/a
1 A 2 A PARENT 1000
2 B 3 B PARENT 2000
3 F 4 n/a n/a
I tried to do this as follows:
def find_global_name(x):
for key1 in co_dict.keys():
if (x['co-name'] == co_dict[key1]['co_name']):
return co_dict[key1]['global_name']
else:
return "n/a"
df['global_name'] = df.apply(find_global_name, axis=1)
But the for loop seems to stop after the first company "A" match. I’m stuck trying to figuring out why this. Why is this method not working? How do I accomplish this task? Many, many thanks for your help.
Create a dataframe from the dictionary and merge with df
:
(df.merge(pd.DataFrame(co_dict).T,
left_on = 'co-name',
right_on = 'co_name',
how = 'left')
.drop(columns=['co_name', 'company_type'])
)
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN
Your code can be much clearer and faster if you trim the dictionary to only the values you need, and combine with map
which is quite fast … this also allows you to avoid writing code that is probably not necessary.
Create dictionaries for the values you need:
global_name = {val['co_name']: val['global_name']
for _, val in co_dict.items()}
sales = {val['co_name']: val['sales']
for _, val in co_dict.items()}
global_name
{'A': 'A PARENT', 'B': 'B PARENT'}
sales
{'A': '1000', 'B': '1000'}
Map the dictionaries to the dataframe to create the new columns (the assumption here is that values in the column are unique; if not, this won’t work and merge is a better option, as it can handle duplicates):
df.assign(global_name = df['co-name'].map(global_name),
sales = df['co-name'].map(sales))
Out[722]:
co-name co-number global_name sales
0 M 1 NaN NaN
1 A 2 A PARENT 1000
2 B 3 B PARENT 1000
3 F 4 NaN NaN