Add/subtract dataframes with different column labels
Question:
I’m trying to add/subtract two dataframes with different column labels. Is it possible to do this without renaming the columns to align them? I would like to keep the original labels.
Answers:
Consider dataframes A
and B
A = pd.DataFrame([[1, 2], [3, 4]], ['a', 'b'], ['A', 'B'])
B = pd.DataFrame([[1, 2], [3, 4]], ['c', 'd'], ['C', 'D'])
A
B
Add them together and we have a mess.
A + B
Add their underlying arrays
A.values + B.values
array([[2, 4],
[6, 8]])
That’s closer to what we want.
To get what you asked for, you need to decide which dataframe has the columns and index you want and add the values of the other to the dataframe you chose. Let’s say I choose to keep A
‘s indices.
A + B.values
That ought to do it!
Following up on @piRSquared answer which leverage matrices operations (numpy) but ideally we would want to stay in a pandas framework. What about:
pd.DataFrame(
df_A.values - df_B.values,
columns=df_A.columns
)
The explanation about why we take values is given in @piRSquared answer. Here, I additionally recreate another dataframe to hold the data with the column naming from df_A
. It seems to me that this was the most important metainformation, but one could also transfer the indices (row names)… Finally, it is also possible to change the name of the columns, try:
[name_A + '-' + name_B for name_A, name_B in zip(list(df_A.columns),list(df_B.columns))]
Hope it helps!
I’m trying to add/subtract two dataframes with different column labels. Is it possible to do this without renaming the columns to align them? I would like to keep the original labels.
Consider dataframes A
and B
A = pd.DataFrame([[1, 2], [3, 4]], ['a', 'b'], ['A', 'B'])
B = pd.DataFrame([[1, 2], [3, 4]], ['c', 'd'], ['C', 'D'])
A
B
Add them together and we have a mess.
A + B
Add their underlying arrays
A.values + B.values
array([[2, 4],
[6, 8]])
That’s closer to what we want.
To get what you asked for, you need to decide which dataframe has the columns and index you want and add the values of the other to the dataframe you chose. Let’s say I choose to keep A
‘s indices.
A + B.values
That ought to do it!
Following up on @piRSquared answer which leverage matrices operations (numpy) but ideally we would want to stay in a pandas framework. What about:
pd.DataFrame(
df_A.values - df_B.values,
columns=df_A.columns
)
The explanation about why we take values is given in @piRSquared answer. Here, I additionally recreate another dataframe to hold the data with the column naming from df_A
. It seems to me that this was the most important metainformation, but one could also transfer the indices (row names)… Finally, it is also possible to change the name of the columns, try:
[name_A + '-' + name_B for name_A, name_B in zip(list(df_A.columns),list(df_B.columns))]
Hope it helps!