How to iterate over pandas multiindex dataframe using index

Question

I have a data frame df which looks like this. Date and Time are 2 multilevel index

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

I want to run some complex process over daily data block.

Pseudo code would look like

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

So, first of all, I could not find a way to access only blocks for a date

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and found df.index.get_level_values(0), but it returns all the values and that causes loop to run multiple times for a given day. I want to create a Dataframe per day and send it for processing.

Asked By: Yantraguru

||

Source

Answer 1

One easy way would be to groupby the first level of the index – iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

You can also use droplevel to remove the first index (the useless date index):

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df.droplevel(0))
     ...:
         observation1  observation2
Time
9:15:00     79.373668           224
9:16:00    130.841316           477
...

Answered By: chrisb

Answer 2

What about this?

for idate in df.index.get_level_values('date'):
    complex_process(df.ix[idate], idate)

Answered By: psorenson

Answer 3

Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:

for date in df.index.get_level_values('date').unique():
    print(df.loc[date])

Answered By: melbay

Answer 4

Late to the party, I found that the following works, too:

for date in df.index.unique("date"):
    print(df.loc[date])

It uses the level optional parameter of the Index.unique method introduced in version 0.23.0.

You can specify either the level number or label.

Answered By: sanzoghenzo

Answer 5

Another alternative:

for date in df.index.levels[0]:
    print(df.loc[date])

The difference with the df.index.unique("date") proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.

Answered By: Roger Vadim

How to iterate over pandas multiindex dataframe using index

Question:

Answers: