Hierarchical Index from pd dataframe to Excel, need to forward fill and unmerge
Question:
I have a pandas dataframe with a three-level hierarchical index, created by the following:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
Basically, a table where Country is the highest level, and Description is the second level, and followed by the date grouped by month.
PICTURE A
I’d like to do two unrelated things:
Unmerge all the hierarchical indices in this structure within python, then forward fill to create PICTURE B.
PICTURE B
Be able to transform the datetimes while in the hierarchical structure of PICTURE A into YYYY-MM in python so when I export it I get PICTURE C. (I understand that I can do that from the structure in PICTURE B, I just want to be able to do it while it’s still in the hierarchical structure in a pandas dataframe).
PICTURE C
Any tips?
Answers:
After groupby
you get MultiIndex DataFrame
, so values are repaeting in first and second level, only not displayning.
If second DataFrame is not necessary you can convert DatetimeIndex
to YYYY-MM
format by strftime
or to month period by to_period
:
df_grouped = df.groupby(['Country','Description', df.index.strftime('%Y-%m')]).sum()
Or:
df_grouped = df.groupby(['Country','Description', df.index.to_period('m')]).sum()
If need second DataFrame
add reset_index
for convert levels to columns and for convert second level MultiIndex.set_levels
with get_level_values
:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
df = df_grouped.reset_index()
idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
Sample:
rng = pd.date_range('2017-04-03', periods=10, freq='10D')
df = pd.DataFrame({'Country': ['Country'] * 10,
'Description':['A'] * 3 + ['B'] * 3 + ['C'] * 4,
'a': range(10)}, index=rng)
print (df)
Country Description a
2017-04-03 Country A 0
2017-04-13 Country A 1
2017-04-23 Country A 2
2017-05-03 Country B 3
2017-05-13 Country B 4
2017-05-23 Country B 5
2017-06-02 Country C 6
2017-06-12 Country C 7
2017-06-22 Country C 8
2017-07-02 Country C 9
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
print (df_grouped)
a
Country Description
Country A 2017-04-30 3
B 2017-05-31 12
C 2017-06-30 21
2017-07-31 9
df = df_grouped.reset_index().rename(columns={'level_2':'Date'})
print (df)
Country Description Date a
0 Country A 2017-04-30 3
1 Country B 2017-05-31 12
2 Country C 2017-06-30 21
3 Country C 2017-07-31 9
idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
print (df_grouped)
a
Country Description
Country A 2017-04 3
B 2017-05 12
C 2017-06 21
2017-07 9
I realize this is an older post, but if you just want to get the displays to not look sparse, but the export to Excel still ends up merged, check that you have pandas version 1.5.2 then use the following:
pd.set_option("display.multi_sparse", False) # for output display
I don’t know how to get the export to Excel to have all the grouped-by rows be filled with the index, that’s my question here.
I have a pandas dataframe with a three-level hierarchical index, created by the following:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
Basically, a table where Country is the highest level, and Description is the second level, and followed by the date grouped by month.
PICTURE A
I’d like to do two unrelated things:
Unmerge all the hierarchical indices in this structure within python, then forward fill to create PICTURE B.
PICTURE B
Be able to transform the datetimes while in the hierarchical structure of PICTURE A into YYYY-MM in python so when I export it I get PICTURE C. (I understand that I can do that from the structure in PICTURE B, I just want to be able to do it while it’s still in the hierarchical structure in a pandas dataframe).
PICTURE C
Any tips?
After groupby
you get MultiIndex DataFrame
, so values are repaeting in first and second level, only not displayning.
If second DataFrame is not necessary you can convert DatetimeIndex
to YYYY-MM
format by strftime
or to month period by to_period
:
df_grouped = df.groupby(['Country','Description', df.index.strftime('%Y-%m')]).sum()
Or:
df_grouped = df.groupby(['Country','Description', df.index.to_period('m')]).sum()
If need second DataFrame
add reset_index
for convert levels to columns and for convert second level MultiIndex.set_levels
with get_level_values
:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
df = df_grouped.reset_index()
idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
Sample:
rng = pd.date_range('2017-04-03', periods=10, freq='10D')
df = pd.DataFrame({'Country': ['Country'] * 10,
'Description':['A'] * 3 + ['B'] * 3 + ['C'] * 4,
'a': range(10)}, index=rng)
print (df)
Country Description a
2017-04-03 Country A 0
2017-04-13 Country A 1
2017-04-23 Country A 2
2017-05-03 Country B 3
2017-05-13 Country B 4
2017-05-23 Country B 5
2017-06-02 Country C 6
2017-06-12 Country C 7
2017-06-22 Country C 8
2017-07-02 Country C 9
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
print (df_grouped)
a
Country Description
Country A 2017-04-30 3
B 2017-05-31 12
C 2017-06-30 21
2017-07-31 9
df = df_grouped.reset_index().rename(columns={'level_2':'Date'})
print (df)
Country Description Date a
0 Country A 2017-04-30 3
1 Country B 2017-05-31 12
2 Country C 2017-06-30 21
3 Country C 2017-07-31 9
idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
print (df_grouped)
a
Country Description
Country A 2017-04 3
B 2017-05 12
C 2017-06 21
2017-07 9
I realize this is an older post, but if you just want to get the displays to not look sparse, but the export to Excel still ends up merged, check that you have pandas version 1.5.2 then use the following:
pd.set_option("display.multi_sparse", False) # for output display
I don’t know how to get the export to Excel to have all the grouped-by rows be filled with the index, that’s my question here.