Hierarchical Index from pd dataframe to Excel, need to forward fill and unmerge

Question:

I have a pandas dataframe with a three-level hierarchical index, created by the following:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()

Basically, a table where Country is the highest level, and Description is the second level, and followed by the date grouped by month.

PICTURE A

Here is a sample after exporting to excel:

I’d like to do two unrelated things:

Unmerge all the hierarchical indices in this structure within python, then forward fill to create PICTURE B.

PICTURE B

enter image description here

Be able to transform the datetimes while in the hierarchical structure of PICTURE A into YYYY-MM in python so when I export it I get PICTURE C. (I understand that I can do that from the structure in PICTURE B, I just want to be able to do it while it’s still in the hierarchical structure in a pandas dataframe).

PICTURE C

enter image description here

Any tips?

Asked By: Qonl

||

Answers:

After groupby you get MultiIndex DataFrame, so values are repaeting in first and second level, only not displayning.

If second DataFrame is not necessary you can convert DatetimeIndex to YYYY-MM format by strftime or to month period by to_period:

df_grouped = df.groupby(['Country','Description', df.index.strftime('%Y-%m')]).sum()

Or:

df_grouped = df.groupby(['Country','Description', df.index.to_period('m')]).sum()

If need second DataFrame add reset_index for convert levels to columns and for convert second level MultiIndex.set_levels with get_level_values:

df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()

df = df_grouped.reset_index()

idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)

Sample:

rng = pd.date_range('2017-04-03', periods=10, freq='10D')
df = pd.DataFrame({'Country': ['Country'] * 10,
                   'Description':['A'] * 3 + ['B'] * 3 + ['C'] * 4, 
                   'a': range(10)}, index=rng)  
print (df)
            Country Description  a
2017-04-03  Country           A  0
2017-04-13  Country           A  1
2017-04-23  Country           A  2
2017-05-03  Country           B  3
2017-05-13  Country           B  4
2017-05-23  Country           B  5
2017-06-02  Country           C  6
2017-06-12  Country           C  7
2017-06-22  Country           C  8
2017-07-02  Country           C  9

df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
print (df_grouped)
                                 a
Country Description               
Country A           2017-04-30   3
        B           2017-05-31  12
        C           2017-06-30  21
                    2017-07-31   9

df = df_grouped.reset_index().rename(columns={'level_2':'Date'})
print (df)
   Country Description       Date   a
0  Country           A 2017-04-30   3
1  Country           B 2017-05-31  12
2  Country           C 2017-06-30  21
3  Country           C 2017-07-31   9

idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
print (df_grouped)
                              a
Country Description            
Country A           2017-04   3
        B           2017-05  12
        C           2017-06  21
                    2017-07   9
Answered By: jezrael

I realize this is an older post, but if you just want to get the displays to not look sparse, but the export to Excel still ends up merged, check that you have pandas version 1.5.2 then use the following:

pd.set_option("display.multi_sparse", False) # for output display

I don’t know how to get the export to Excel to have all the grouped-by rows be filled with the index, that’s my question here.

Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.