Pandas concat dictionary to dataframe
Question:
I have an existing dataframe and I’m trying to concatenate a dictionary where the length of the dictionary is different from the dataframe
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
and
test = {"One": [0.23413, 0.19235, 0.51221], "Two": [0.01293, 0.12235, 0.63291]}
I’m trying to add test
to df
, while changing the keys to "D"
and "C"
and I’ve had a look at https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html and
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html which indicates that I should be able to concatenate the dictionary to the dataframe
I’ve tried:
pd.concat([df, test], axis=1, ignore_index=True, keys=["D", "E"])
pd.concat([df, test], axis=1, ignore_index=True)
but I’m not having any luck, the result I’m trying to achieve is
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
Answers:
Assuming you want to add them as rows:
>>> pd.concat([df, pd.DataFrame(test.values(), columns=df.columns)], ignore_index=True)
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
7 0.01293 0.12235 0.63291
8 0.23413 0.19235 0.51221
If added as new columns:
df_new = pd.concat([df, pd.DataFrame(test.values()).T], ignore_index=True, axis=1)
df_new.columns =
df.columns.tolist() + [{'One': 'D', 'Two': 'E'}.get(k) for k in test.keys()]
>>> df_new
A B C E D
0 0.46324 0.32425 0.42194 0.01293 0.23413
1 0.10596 0.35910 0.21004 0.12235 0.19235
2 0.69209 0.12951 0.50186 0.63291 0.51221
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
Order is not guaranteed in dictionaries (e.g. test
), so the new column names actually need to be mapped to the keys.
The only way you can do that is with:
df.join(pd.DataFrame(test).rename(columns={'One':'D','Two':'E'}))
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
because as @Alexander mentioned correctly the number of rows being concatenated should match. Otherwise, as in your case, missing rows will be filled with NaN
To add a dictionary as new columns, another method is to convert it into a dataframe and simply assign.
df[['D', 'E']] = pd.DataFrame(test)
To add a dictionary as new rows, another method is to convert the dict into a dataframe using from_dict
method and concatenate.
df = pd.concat([df, pd.DataFrame.from_dict(test, orient='index', columns=df.columns)], ignore_index=True)
I have an existing dataframe and I’m trying to concatenate a dictionary where the length of the dictionary is different from the dataframe
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
and
test = {"One": [0.23413, 0.19235, 0.51221], "Two": [0.01293, 0.12235, 0.63291]}
I’m trying to add test
to df
, while changing the keys to "D"
and "C"
and I’ve had a look at https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html and
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html which indicates that I should be able to concatenate the dictionary to the dataframe
I’ve tried:
pd.concat([df, test], axis=1, ignore_index=True, keys=["D", "E"])
pd.concat([df, test], axis=1, ignore_index=True)
but I’m not having any luck, the result I’m trying to achieve is
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
Assuming you want to add them as rows:
>>> pd.concat([df, pd.DataFrame(test.values(), columns=df.columns)], ignore_index=True)
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
7 0.01293 0.12235 0.63291
8 0.23413 0.19235 0.51221
If added as new columns:
df_new = pd.concat([df, pd.DataFrame(test.values()).T], ignore_index=True, axis=1)
df_new.columns =
df.columns.tolist() + [{'One': 'D', 'Two': 'E'}.get(k) for k in test.keys()]
>>> df_new
A B C E D
0 0.46324 0.32425 0.42194 0.01293 0.23413
1 0.10596 0.35910 0.21004 0.12235 0.19235
2 0.69209 0.12951 0.50186 0.63291 0.51221
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
Order is not guaranteed in dictionaries (e.g. test
), so the new column names actually need to be mapped to the keys.
The only way you can do that is with:
df.join(pd.DataFrame(test).rename(columns={'One':'D','Two':'E'}))
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
because as @Alexander mentioned correctly the number of rows being concatenated should match. Otherwise, as in your case, missing rows will be filled with NaN
To add a dictionary as new columns, another method is to convert it into a dataframe and simply assign.
df[['D', 'E']] = pd.DataFrame(test)
To add a dictionary as new rows, another method is to convert the dict into a dataframe using from_dict
method and concatenate.
df = pd.concat([df, pd.DataFrame.from_dict(test, orient='index', columns=df.columns)], ignore_index=True)