How to iterate over pandas multiindex dataframe using index
Question:
I have a data frame df which looks like this. Date and Time are 2 multilevel index
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.76646 344
9:18:00 463.276946 212
I want to run some complex process over daily data block.
Pseudo code would look like
for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)
So, first of all, I could not find a way to access only blocks for a date
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and found df.index.get_level_values(0)
, but it returns all the values and that causes loop to run multiple times for a given day. I want to create a Dataframe per day and send it for processing.
Answers:
One easy way would be to groupby the first level of the index – iterating over the groupby object will return the group keys and a subframe containing each group.
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df)
...:
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
observation1 observation2
date Time
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.766460 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.773330 621
observation1 observation2
date Time
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.766460 344
9:18:00 463.276946 212
You can also use droplevel
to remove the first index (the useless date
index):
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df.droplevel(0))
...:
observation1 observation2
Time
9:15:00 79.373668 224
9:16:00 130.841316 477
...
What about this?
for idate in df.index.get_level_values('date'):
complex_process(df.ix[idate], idate)
Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:
for date in df.index.get_level_values('date').unique():
print(df.loc[date])
Late to the party, I found that the following works, too:
for date in df.index.unique("date"):
print(df.loc[date])
It uses the level
optional parameter of the Index.unique
method introduced in version 0.23.0.
You can specify either the level number or label.
Another alternative:
for date in df.index.levels[0]:
print(df.loc[date])
The difference with the df.index.unique("date")
proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.
I have a data frame df which looks like this. Date and Time are 2 multilevel index
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.76646 344
9:18:00 463.276946 212
I want to run some complex process over daily data block.
Pseudo code would look like
for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)
So, first of all, I could not find a way to access only blocks for a date
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and found df.index.get_level_values(0)
, but it returns all the values and that causes loop to run multiple times for a given day. I want to create a Dataframe per day and send it for processing.
One easy way would be to groupby the first level of the index – iterating over the groupby object will return the group keys and a subframe containing each group.
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df)
...:
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
observation1 observation2
date Time
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.766460 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.773330 621
observation1 observation2
date Time
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.766460 344
9:18:00 463.276946 212
You can also use droplevel
to remove the first index (the useless date
index):
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df.droplevel(0))
...:
observation1 observation2
Time
9:15:00 79.373668 224
9:16:00 130.841316 477
...
What about this?
for idate in df.index.get_level_values('date'):
complex_process(df.ix[idate], idate)
Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:
for date in df.index.get_level_values('date').unique():
print(df.loc[date])
Late to the party, I found that the following works, too:
for date in df.index.unique("date"):
print(df.loc[date])
It uses the level
optional parameter of the Index.unique
method introduced in version 0.23.0.
You can specify either the level number or label.
Another alternative:
for date in df.index.levels[0]:
print(df.loc[date])
The difference with the df.index.unique("date")
proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.