Pandas Merge and create a multi-index for duplicate columns
Question:
I have two dataframes
sessions = pd.DataFrame(
{"ID": [1,2,3,4,5],
"2018-06-30": [23,34,45,67,75],
"2018-07-31": [32,43,45,76,57]})
leads = pd.DataFrame(
{"ID": [1,2,3,4,5],
"2018-06-30": [7,10,28,15,30],
"2018-07-31": [7,10,28,15,30]})
I wanna merge the two dataframes on ID and then create a multi-index to look like:
6/30/2018 7/31/2018
ID sessions leads sessions leads
1 23 7 32 7
2 34 10 43 12
3 45 28 45 30
4 67 15 76 18
5 75 30 57 30
How can I do it?
A direct pd.merge
will create suffixes _x
, _y
which I do not want.
Answers:
Use concat
with set_index
by ID
in both DataFrames and then swaplevel
with sort_index
for expected MultiIndex
in columns:
df = (pd.concat([sessions.set_index('ID'),
leads.set_index('ID')],
axis=1,
keys=['sessions','leads'])
.swaplevel(0, 1, axis=1)
.sort_index(axis=1, ascending=[True, False])
)
print(df)
2018-06-30 2018-07-31
sessions leads sessions leads
ID
1 23 7 32 7
2 34 10 43 10
3 45 28 45 28
4 67 15 76 15
5 75 30 57 30
Here is a solution with pd.DataFrame.merge
, pd.DataFrame.set_axis
, pd.DataFrame.pipe
and pd.DataFrame.reindex
that could be applied in this case:
(sessions.merge(leads, on='ID', suffixes=('_sessions', '_leads'))
.set_index('ID')
.pipe(lambda d: d.set_axis(d.columns.str.split('_', expand=True), axis=1))
.pipe(lambda d: d.reindex(columns = pd.MultiIndex.from_product([d.columns.levels[0], d.columns.levels[1]])))
.sort_index(axis=1, ascending=[True, False]))
2018-06-30 2018-07-31
sessions leads sessions leads
ID
1 23 7 32 7
2 34 10 43 10
3 45 28 45 28
4 67 15 76 15
5 75 30 57 30
I have two dataframes
sessions = pd.DataFrame(
{"ID": [1,2,3,4,5],
"2018-06-30": [23,34,45,67,75],
"2018-07-31": [32,43,45,76,57]})
leads = pd.DataFrame(
{"ID": [1,2,3,4,5],
"2018-06-30": [7,10,28,15,30],
"2018-07-31": [7,10,28,15,30]})
I wanna merge the two dataframes on ID and then create a multi-index to look like:
6/30/2018 7/31/2018
ID sessions leads sessions leads
1 23 7 32 7
2 34 10 43 12
3 45 28 45 30
4 67 15 76 18
5 75 30 57 30
How can I do it?
A direct pd.merge
will create suffixes _x
, _y
which I do not want.
Use concat
with set_index
by ID
in both DataFrames and then swaplevel
with sort_index
for expected MultiIndex
in columns:
df = (pd.concat([sessions.set_index('ID'),
leads.set_index('ID')],
axis=1,
keys=['sessions','leads'])
.swaplevel(0, 1, axis=1)
.sort_index(axis=1, ascending=[True, False])
)
print(df)
2018-06-30 2018-07-31
sessions leads sessions leads
ID
1 23 7 32 7
2 34 10 43 10
3 45 28 45 28
4 67 15 76 15
5 75 30 57 30
Here is a solution with pd.DataFrame.merge
, pd.DataFrame.set_axis
, pd.DataFrame.pipe
and pd.DataFrame.reindex
that could be applied in this case:
(sessions.merge(leads, on='ID', suffixes=('_sessions', '_leads'))
.set_index('ID')
.pipe(lambda d: d.set_axis(d.columns.str.split('_', expand=True), axis=1))
.pipe(lambda d: d.reindex(columns = pd.MultiIndex.from_product([d.columns.levels[0], d.columns.levels[1]])))
.sort_index(axis=1, ascending=[True, False]))
2018-06-30 2018-07-31
sessions leads sessions leads
ID
1 23 7 32 7
2 34 10 43 10
3 45 28 45 28
4 67 15 76 15
5 75 30 57 30