Join dataframes – one with multiindex columns and the other without

Question:

I’m trying to join two dataframes – one with multiindex columns and the other with a single column name. They have similar index.

I get the following warning:

"UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)"

For example:

import pandas as pd
import numpy as np

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

np.random.seed(2022)  # so the data is the same each time
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df2 = pd.DataFrame(np.random.randn(3), index=['A', 'B', 'C'],columns=['w'])

df3 = df.join(df2)

DataFrame Views

df

first        bar                 baz                 foo                 qux          
second       one       two       one       two       one       two       one       two
A      -0.000528 -0.274901 -0.139286  1.984686  0.282109  0.760809  0.300982  0.540297
B       0.373497  0.377813 -0.090213 -2.305943  1.142760 -1.535654 -0.863752  1.016545
C       1.033964 -0.824492  0.018905 -0.383344 -0.304185  0.997292 -0.127274 -1.475886

df2

          w
A -1.940906
B  0.833649
C -0.567218

df3 – Result

   (bar, one)  (bar, two)  (baz, one)  (baz, two)  (foo, one)  (foo, two)  (qux, one)  (qux, two)         w
A   -0.000528   -0.274901   -0.139286    1.984686    0.282109    0.760809    0.300982    0.540297 -1.940906
B    0.373497    0.377813   -0.090213   -2.305943    1.142760   -1.535654   -0.863752    1.016545  0.833649
C    1.033964   -0.824492    0.018905   -0.383344   -0.304185    0.997292   -0.127274   -1.475886 -0.567218

  • df.join(df2) from pandas v1.3.0 results in a FutureWarning
    • FutureWarning: merging between different levels is deprecated and will be removed in a future version. (2 levels on the left, 1 on the right) df3 = df.join(df2).

What is the best way to join these two dataframes?

Asked By: Eyal S.

||

Answers:

It depends on what you want! Do you want the column from df2 to be aligned with the 1st or second level of columns from df?

You have to add a level to the columns of df2

Super cheezy with pd.concat

df.join(pd.concat([df2], axis=1, keys=['a']))

Better way

df2.columns = pd.MultiIndex.from_product([['a'], df2.columns])

df.join(df2)

enter image description here

Answered By: piRSquared

I think the simplest way is to convert df2 to MultiIndex, and then use concat or join:

df2.columns = pd.MultiIndex.from_tuples([('a','w')])
print (df2)
          a
          w
A -1.940906
B  0.833649
C -0.567218

Or:

df2.columns = [['a'], df2.columns]
print (df2)
          a
          w
A -1.940906
B  0.833649
C -0.567218

df3 = pd.concat([df, df2], axis=1)

Or:

df3 = df.join(df2)

Result:

print (df3)
first        bar                 baz                 foo                 qux                   a
second       one       two       one       two       one       two       one       two         w
A      -0.000528 -0.274901 -0.139286  1.984686  0.282109  0.760809  0.300982  0.540297 -1.940906
B       0.373497  0.377813 -0.090213 -2.305943  1.142760 -1.535654 -0.863752  1.016545  0.833649
C       1.033964 -0.824492  0.018905 -0.383344 -0.304185  0.997292 -0.127274 -1.475886 -0.567218

Additional Resources

Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.