How to combine time range and boolean indexing?

Question:

I have a DataFrame with a datetime index:

tbl.iloc[:,:2].head(5)

date_time               var1    var2    
2011-01-01 00:05:00     97.97   1009.28
2011-01-01 00:10:00     97.53   1009.53
2011-01-01 00:15:00     97.38   1009.15
2011-01-01 00:20:00     97.23   1009.03
2011-01-01 00:25:00     97.01   1009.03

Now I want to select Mondays-Fridays from 6am-7pm, Saturdays from 6am-5pm and Sundays 8am-5pm.

I can do that for a time range with:

import datetime
selection = tbl.ix[datetime.time(6):datetime.time(19)]

Adding the weekday condition, i.e., combining time range and boolean indexing apparently doesn’t work the way I tried it:

tbl['weekday'] = tbl.index.weekday
test = tbl[(tbl.ix[datetime.time(6):datetime.time(19)]) & (tbl['weekday'] == 4)]

=> TypeError: Cannot compare type ‘Timestamp’ with type ‘str’

test = tbl[(tbl.index>datetime.time(6)) (tbl.index>datetime.time(19)) & (tbl['weekday'] == 4)]

=> TypeError: type object 08:00:00

tbl['date'] = tbl.index
test = tbl[(tbl['date']>datetime.time(8)) & (tbl['weekday'] == 4)]

=> ValueError: Could not construct Timestamp from argument

What is wrong with my code?

Asked By: tobip

||

Answers:

the first bit filters the dataframe, the second bit returns boolean:
Try

test = (tbl[(tbl.ix[datetime.time(6):datetime.time(19)]).ix[tbl.weekday == 4)]

basically applies the 1st filter and then the second on top of it. Equivalent of an boolean and.

I suggest you use something like Ipython or its notebook to check the intermediate results of your functions to make sure that they are still as expected. Very difficult t write these expressions straight out of your head if you are not experienced with pandas syntax yet.

Answered By: Joop

I found a solution now:

criterion1 = tbl.index.map(lambda i: i.hour >= 8)
criterion2 = tbl.index.map(lambda i: i.hour < 19)
criterion3 = (tbl['weekday'] == 4) 

tbl[criterion1 & criterion2 & criterion3]

Is there something more elegant?

Answered By: tobip

More elegant (@tobip) solution using boolean indexing.

# create index that gives array of row indices in the given time range
idx = tbl.index.indexer_between_time("8:00", "19:00", include_end=False)
# convert index array to boolean index
criterion1 = np.zeros(tbl.shape[0], dtype=bool)
criterion1[idx] = 1

# one more boolean index 
criterion2 = (tbl['weekday'] == 4) 

# combine boolean indices using logical and
tbl[criterion1 & criterion2]

Answered By: olegkhr
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.