How do I search a nested Python dictionary for a matching dataframe value and then update the dataframe?

Question:

I have a Python dictionary with company information that is structured like this:

co_dict = {'0': {'co_name': 'A',
  'company_type': 'Public',
  'global_name': 'A PARENT',
  'sales': '1000'},
 '1': {'co_name': 'B',
  'company_type': 'Public',
  'global_name': 'B PARENT',
  'sales': '1000'}}

And a Pandas dataframe that looks like this (the real df is much longer):

df = pd.DataFrame({'co-name': ['M','A','B','F'], 'co-number': [1,2,3,4]})

  co-name co-number
0    M     1
1    A     2
2    B     3
3    F     4

I would like to find matches for each df["co-name"] in the dictionary and append the corresponding dictionary values for "global_name" and "sales" to a new column in the df. When there is no match, the columns should both read "n/a". So the end result would look like this:

   co-name  co-number  global_name  sales
0   M          1        n/a         n/a
1   A          2        A PARENT    1000
2   B          3        B PARENT    2000
3   F          4        n/a         n/a

I tried to do this as follows:

def find_global_name(x):
    for key1 in co_dict.keys():
        if (x['co-name'] == co_dict[key1]['co_name']):
            return co_dict[key1]['global_name']
        else:
            return "n/a"
        
df['global_name'] = df.apply(find_global_name, axis=1)

But the for loop seems to stop after the first company "A" match. I’m stuck trying to figuring out why this. Why is this method not working? How do I accomplish this task? Many, many thanks for your help.

Asked By: vcodingadventures

||

Answers:

Create a dataframe from the dictionary and merge with df:

(df.merge(pd.DataFrame(co_dict).T, 
         left_on = 'co-name',
         right_on = 'co_name', 
         how = 'left')
  .drop(columns=['co_name', 'company_type'])
 )
 
  co-name  co-number global_name sales
0       M          1         NaN   NaN
1       A          2    A PARENT  1000
2       B          3    B PARENT  1000
3       F          4         NaN   NaN

Your code can be much clearer and faster if you trim the dictionary to only the values you need, and combine with map which is quite fast … this also allows you to avoid writing code that is probably not necessary.

Create dictionaries for the values you need:

global_name = {val['co_name']: val['global_name'] 
                         for _, val in co_dict.items()}

 sales = {val['co_name']: val['sales']
         for _, val in co_dict.items()}

 global_name
 {'A': 'A PARENT', 'B': 'B PARENT'}

 sales
 {'A': '1000', 'B': '1000'}

Map the dictionaries to the dataframe to create the new columns (the assumption here is that values in the column are unique; if not, this won’t work and merge is a better option, as it can handle duplicates):

df.assign(global_name = df['co-name'].map(global_name), 
          sales = df['co-name'].map(sales))
Out[722]: 
  co-name  co-number global_name sales
0       M          1         NaN   NaN
1       A          2    A PARENT  1000
2       B          3    B PARENT  1000
3       F          4         NaN   NaN
Answered By: sammywemmy