Unable to print multiindex dataframe to excel with merged cells
Question:
I have a dataframe df
which looks as following:
Date ConstraintType Col1 Col2
2020-07-15 N-S w1 521133
2020-07-15 N-S w2 550260
2020-07-15 CSD d1 522417
2020-07-15 CSD d2 534542
2020-07-15 A d4 534905
2020-07-15 B d5 534904
The index of dataframe is:
df.index
Out[6]:
MultiIndex([('2020-07-15', 'N-S'),
('2020-07-15', 'N-S'),
('2020-07-15', 'CSD'),
('2020-07-15', 'CSD'),
('2020-07-15', 'A'),
('2020-07-15', 'B')],
names=['Date', 'ConstraintType'])
But when I print it to excel it appears as following:
I was expecting the following:
I am using the following code:
df.to_excel(r'C:UsersramDesktopz1.xlsx', merge_cells=True)
Answers:
From the provided DataFrame:
- Use
.reset_index()
to remove the index columns from the index
- Use
where
, and .shift
to make all of the cell values blank except for the first occurrence of those values in the columns Date
and ConstraintType
.
- Finally, use
.set_index
to put them back on the index, this time with only one unrepeated value and write to_excel
. Now, merge_cells=True
should work.
code:
df=df.reset_index()
df['Date'] = df['Date'].where(df['Date'] != df['Date'].shift(), '')
df['ConstraintType'] = df['ConstraintType'].where(df['ConstraintType'] != df['ConstraintType'].shift(), '')
df = df.set_index(['Date', 'ConstraintType'])
df.to_excel(r'C:UsersramDesktopz1.xlsx', merge_cells=True)
excel output:
In pandas the inner most index must label each row.
Therefore the inner most index must be manually handled, as shown in @David Erickson ‘s answer. Pandas automatically hides outer indices; see below example:
import pandas as pd
tuples = [["2020-07-15", "N-S"],
["2020-07-15", "N-S"],
["2020-07-15", "CSD"],
["2020-07-15", "CSD"],
["2020-07-15", "A"],
["2020-07-15", "B"]
]
index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'ConstraintType'])
df = pd.DataFrame([
["w1", 521133],
["w2", 550260],
["d1", 522417],
["d2", 534542],
["d4", 534905],
["d5", 534904],
], columns=["Col1", "Col2"],
index=index
)
print(df, 'n'*2)
print(df.swaplevel(0,1))
Returns:
Col1 Col2
Date ConstraintType
2020-07-15 N-S w1 521133
N-S w2 550260
CSD d1 522417
CSD d2 534542
A d4 534905
B d5 534904
Col1 Col2
ConstraintType Date
N-S 2020-07-15 w1 521133
2020-07-15 w2 550260
CSD 2020-07-15 d1 522417
2020-07-15 d2 534542
A 2020-07-15 d4 534905
B 2020-07-15 d5 534904
Reset index, clean former multi-index columns, then save to Excel without the need of setting the merge_cells option:
df = df.reset_index(drop=False)
row_filt = df['ConstraintType'].eq(df['ConstraintType'].shift())
df.loc[row_filt, 'ConstraintType'] = ''
row_filt = df['Date'].eq(df['Date'].shift())
df.loc[row_filt, 'Date'] = ''
df.to_excel(r'C:UsersramDesktopz1.xlsx')
Produces the following Excel:
I have a dataframe df
which looks as following:
Date ConstraintType Col1 Col2
2020-07-15 N-S w1 521133
2020-07-15 N-S w2 550260
2020-07-15 CSD d1 522417
2020-07-15 CSD d2 534542
2020-07-15 A d4 534905
2020-07-15 B d5 534904
The index of dataframe is:
df.index
Out[6]:
MultiIndex([('2020-07-15', 'N-S'),
('2020-07-15', 'N-S'),
('2020-07-15', 'CSD'),
('2020-07-15', 'CSD'),
('2020-07-15', 'A'),
('2020-07-15', 'B')],
names=['Date', 'ConstraintType'])
But when I print it to excel it appears as following:
I was expecting the following:
I am using the following code:
df.to_excel(r'C:UsersramDesktopz1.xlsx', merge_cells=True)
From the provided DataFrame:
- Use
.reset_index()
to remove the index columns from the index - Use
where
, and.shift
to make all of the cell values blank except for the first occurrence of those values in the columnsDate
andConstraintType
. - Finally, use
.set_index
to put them back on the index, this time with only one unrepeated value and writeto_excel
. Now,merge_cells=True
should work.
code:
df=df.reset_index()
df['Date'] = df['Date'].where(df['Date'] != df['Date'].shift(), '')
df['ConstraintType'] = df['ConstraintType'].where(df['ConstraintType'] != df['ConstraintType'].shift(), '')
df = df.set_index(['Date', 'ConstraintType'])
df.to_excel(r'C:UsersramDesktopz1.xlsx', merge_cells=True)
excel output:
In pandas the inner most index must label each row.
Therefore the inner most index must be manually handled, as shown in @David Erickson ‘s answer. Pandas automatically hides outer indices; see below example:
import pandas as pd
tuples = [["2020-07-15", "N-S"],
["2020-07-15", "N-S"],
["2020-07-15", "CSD"],
["2020-07-15", "CSD"],
["2020-07-15", "A"],
["2020-07-15", "B"]
]
index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'ConstraintType'])
df = pd.DataFrame([
["w1", 521133],
["w2", 550260],
["d1", 522417],
["d2", 534542],
["d4", 534905],
["d5", 534904],
], columns=["Col1", "Col2"],
index=index
)
print(df, 'n'*2)
print(df.swaplevel(0,1))
Returns:
Col1 Col2
Date ConstraintType
2020-07-15 N-S w1 521133
N-S w2 550260
CSD d1 522417
CSD d2 534542
A d4 534905
B d5 534904
Col1 Col2
ConstraintType Date
N-S 2020-07-15 w1 521133
2020-07-15 w2 550260
CSD 2020-07-15 d1 522417
2020-07-15 d2 534542
A 2020-07-15 d4 534905
B 2020-07-15 d5 534904
Reset index, clean former multi-index columns, then save to Excel without the need of setting the merge_cells option:
df = df.reset_index(drop=False)
row_filt = df['ConstraintType'].eq(df['ConstraintType'].shift())
df.loc[row_filt, 'ConstraintType'] = ''
row_filt = df['Date'].eq(df['Date'].shift())
df.loc[row_filt, 'Date'] = ''
df.to_excel(r'C:UsersramDesktopz1.xlsx')
Produces the following Excel: