How to join a dictionary with same key as df index as a new column with values from the dictionary
Question:
I have the following data:
A dictionary dict
with a key: value
structure as tuple(str, str,): list[float]
{
('A', 'B'): [0, 1, 2, 3],
('A', 'C'): [4, 5, 6, 7],
('A', 'D'): [8, 9, 10, 11],
('B', 'A'): [12, 13, 14, 15]
}
And a pandas dataframe df
with an index of 2 columns that correspond to the keys in the dictionary:
df.set_index("first", "second"]).sort_index()
print(df.head(4))
==============================================
tokens
first second
A B 166
C 128
D 160
B A 475
I want to create a new column, numbers
in df
with the values provided from dict
, whose key corresponds with an index row in df
. The example result would be:
print(df.head(4))
========================================================================
tokens numbers
first second
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]
What is the best way to go about this? Keep performance in mind, as this dataframe may be 10-100k rows long
Answers:
You can create a series from the dict, and then assign:
df['numbers'] = pd.Series(d)
Or map the index:
df['numbers'] = df.index.map(d)
Output:
tokens numbers
first second
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]
Create a Series then concatenate it with dataframe:
sr = pd.Series(d, name='numbers')
out = pd.concat([df, sr], axis=1)
print(out)
# Output
tokens numbers
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]
I have the following data:
A dictionary dict
with a key: value
structure as tuple(str, str,): list[float]
{
('A', 'B'): [0, 1, 2, 3],
('A', 'C'): [4, 5, 6, 7],
('A', 'D'): [8, 9, 10, 11],
('B', 'A'): [12, 13, 14, 15]
}
And a pandas dataframe df
with an index of 2 columns that correspond to the keys in the dictionary:
df.set_index("first", "second"]).sort_index()
print(df.head(4))
==============================================
tokens
first second
A B 166
C 128
D 160
B A 475
I want to create a new column, numbers
in df
with the values provided from dict
, whose key corresponds with an index row in df
. The example result would be:
print(df.head(4))
========================================================================
tokens numbers
first second
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]
What is the best way to go about this? Keep performance in mind, as this dataframe may be 10-100k rows long
You can create a series from the dict, and then assign:
df['numbers'] = pd.Series(d)
Or map the index:
df['numbers'] = df.index.map(d)
Output:
tokens numbers
first second
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]
Create a Series then concatenate it with dataframe:
sr = pd.Series(d, name='numbers')
out = pd.concat([df, sr], axis=1)
print(out)
# Output
tokens numbers
A B 166 [0, 1, 2, 3]
C 128 [4, 5, 6, 7]
D 160 [8, 9, 10, 11]
B A 475 [12, 13, 14, 15]