Hierarchical Index from pd dataframe to Excel, need to forward fill and unmerge

Question

I have a pandas dataframe with a three-level hierarchical index, created by the following:
df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()

Basically, a table where Country is the highest level, and Description is the second level, and followed by the date grouped by month.

PICTURE A

I’d like to do two unrelated things:

Unmerge all the hierarchical indices in this structure within python, then forward fill to create PICTURE B.

PICTURE B

Be able to transform the datetimes while in the hierarchical structure of PICTURE A into YYYY-MM in python so when I export it I get PICTURE C. (I understand that I can do that from the structure in PICTURE B, I just want to be able to do it while it’s still in the hierarchical structure in a pandas dataframe).

PICTURE C

Any tips?

Asked By: Qonl

||

Source

Answer 1

After groupby you get MultiIndex DataFrame, so values are repaeting in first and second level, only not displayning.

If second DataFrame is not necessary you can convert DatetimeIndex to YYYY-MM format by strftime or to month period by to_period:

df_grouped = df.groupby(['Country','Description', df.index.strftime('%Y-%m')]).sum()

Or:

df_grouped = df.groupby(['Country','Description', df.index.to_period('m')]).sum()

If need second DataFrame add reset_index for convert levels to columns and for convert second level MultiIndex.set_levels with get_level_values:

df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()

df = df_grouped.reset_index()

idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)

Sample:

rng = pd.date_range('2017-04-03', periods=10, freq='10D')
df = pd.DataFrame({'Country': ['Country'] * 10,
                   'Description':['A'] * 3 + ['B'] * 3 + ['C'] * 4, 
                   'a': range(10)}, index=rng)  
print (df)
            Country Description  a
2017-04-03  Country           A  0
2017-04-13  Country           A  1
2017-04-23  Country           A  2
2017-05-03  Country           B  3
2017-05-13  Country           B  4
2017-05-23  Country           B  5
2017-06-02  Country           C  6
2017-06-12  Country           C  7
2017-06-22  Country           C  8
2017-07-02  Country           C  9

df_grouped = df.groupby(['Country','Description', pd.Grouper(freq = 'M')]).sum()
print (df_grouped)
                                 a
Country Description               
Country A           2017-04-30   3
        B           2017-05-31  12
        C           2017-06-30  21
                    2017-07-31   9

df = df_grouped.reset_index().rename(columns={'level_2':'Date'})
print (df)
   Country Description       Date   a
0  Country           A 2017-04-30   3
1  Country           B 2017-05-31  12
2  Country           C 2017-06-30  21
3  Country           C 2017-07-31   9

idx = df_grouped.index.get_level_values(2).strftime('%Y-%m')
df_grouped.index = df_grouped.index.set_levels(idx, level=2)
print (df_grouped)
                              a
Country Description            
Country A           2017-04   3
        B           2017-05  12
        C           2017-06  21
                    2017-07   9

Answered By: jezrael

Answer 2

I realize this is an older post, but if you just want to get the displays to not look sparse, but the export to Excel still ends up merged, check that you have pandas version 1.5.2 then use the following:

pd.set_option("display.multi_sparse", False) # for output display

I don’t know how to get the export to Excel to have all the grouped-by rows be filled with the index, that’s my question here.

Answered By: MechanicalEngineerMama

Hierarchical Index from pd dataframe to Excel, need to forward fill and unmerge

Question:

Answers: