How to join a dictionary with same key as df index as a new column with values from the dictionary

Question:

I have the following data:

A dictionary dict with a key: value structure as tuple(str, str,): list[float]

{
    ('A', 'B'): [0, 1, 2, 3],
    ('A', 'C'): [4, 5, 6, 7],
    ('A', 'D'): [8, 9, 10, 11],
    ('B', 'A'): [12, 13, 14, 15]
}

And a pandas dataframe df with an index of 2 columns that correspond to the keys in the dictionary:

df.set_index("first", "second"]).sort_index()
print(df.head(4))
==============================================
                                        tokens
first           second  
 A              B                          166  
                C                          128  
                D                          160  
 B              A                          475

I want to create a new column, numbers in df with the values provided from dict, whose key corresponds with an index row in df. The example result would be:

print(df.head(4))
========================================================================
                                        tokens          numbers
first           second  
 A              B                          166          [0, 1, 2, 3]
                C                          128          [4, 5, 6, 7]  
                D                          160          [8, 9, 10, 11]  
 B              A                          475          [12, 13, 14, 15]

What is the best way to go about this? Keep performance in mind, as this dataframe may be 10-100k rows long

Asked By: Sean Sailer

||

Answers:

You can create a series from the dict, and then assign:

df['numbers'] = pd.Series(d)

Or map the index:

df['numbers'] = df.index.map(d)

Output:

              tokens           numbers
first second                          
A     B          166      [0, 1, 2, 3]
      C          128      [4, 5, 6, 7]
      D          160    [8, 9, 10, 11]
B     A          475  [12, 13, 14, 15]
Answered By: Quang Hoang

Create a Series then concatenate it with dataframe:

sr = pd.Series(d, name='numbers')
out = pd.concat([df, sr], axis=1)
print(out)

# Output
     tokens           numbers
A B     166      [0, 1, 2, 3]
  C     128      [4, 5, 6, 7]
  D     160    [8, 9, 10, 11]
B A     475  [12, 13, 14, 15]
Answered By: Corralien