How to combine time range and boolean indexing?
Question:
I have a DataFrame with a datetime index:
tbl.iloc[:,:2].head(5)
date_time var1 var2
2011-01-01 00:05:00 97.97 1009.28
2011-01-01 00:10:00 97.53 1009.53
2011-01-01 00:15:00 97.38 1009.15
2011-01-01 00:20:00 97.23 1009.03
2011-01-01 00:25:00 97.01 1009.03
Now I want to select Mondays-Fridays from 6am-7pm, Saturdays from 6am-5pm and Sundays 8am-5pm.
I can do that for a time range with:
import datetime
selection = tbl.ix[datetime.time(6):datetime.time(19)]
Adding the weekday condition, i.e., combining time range and boolean indexing apparently doesn’t work the way I tried it:
tbl['weekday'] = tbl.index.weekday
test = tbl[(tbl.ix[datetime.time(6):datetime.time(19)]) & (tbl['weekday'] == 4)]
=> TypeError: Cannot compare type ‘Timestamp’ with type ‘str’
test = tbl[(tbl.index>datetime.time(6)) (tbl.index>datetime.time(19)) & (tbl['weekday'] == 4)]
=> TypeError: type object 08:00:00
tbl['date'] = tbl.index
test = tbl[(tbl['date']>datetime.time(8)) & (tbl['weekday'] == 4)]
=> ValueError: Could not construct Timestamp from argument
What is wrong with my code?
Answers:
the first bit filters the dataframe, the second bit returns boolean:
Try
test = (tbl[(tbl.ix[datetime.time(6):datetime.time(19)]).ix[tbl.weekday == 4)]
basically applies the 1st filter and then the second on top of it. Equivalent of an boolean and.
I suggest you use something like Ipython or its notebook to check the intermediate results of your functions to make sure that they are still as expected. Very difficult t write these expressions straight out of your head if you are not experienced with pandas syntax yet.
I found a solution now:
criterion1 = tbl.index.map(lambda i: i.hour >= 8)
criterion2 = tbl.index.map(lambda i: i.hour < 19)
criterion3 = (tbl['weekday'] == 4)
tbl[criterion1 & criterion2 & criterion3]
Is there something more elegant?
More elegant (@tobip) solution using boolean indexing.
# create index that gives array of row indices in the given time range
idx = tbl.index.indexer_between_time("8:00", "19:00", include_end=False)
# convert index array to boolean index
criterion1 = np.zeros(tbl.shape[0], dtype=bool)
criterion1[idx] = 1
# one more boolean index
criterion2 = (tbl['weekday'] == 4)
# combine boolean indices using logical and
tbl[criterion1 & criterion2]
I have a DataFrame with a datetime index:
tbl.iloc[:,:2].head(5)
date_time var1 var2
2011-01-01 00:05:00 97.97 1009.28
2011-01-01 00:10:00 97.53 1009.53
2011-01-01 00:15:00 97.38 1009.15
2011-01-01 00:20:00 97.23 1009.03
2011-01-01 00:25:00 97.01 1009.03
Now I want to select Mondays-Fridays from 6am-7pm, Saturdays from 6am-5pm and Sundays 8am-5pm.
I can do that for a time range with:
import datetime
selection = tbl.ix[datetime.time(6):datetime.time(19)]
Adding the weekday condition, i.e., combining time range and boolean indexing apparently doesn’t work the way I tried it:
tbl['weekday'] = tbl.index.weekday
test = tbl[(tbl.ix[datetime.time(6):datetime.time(19)]) & (tbl['weekday'] == 4)]
=> TypeError: Cannot compare type ‘Timestamp’ with type ‘str’
test = tbl[(tbl.index>datetime.time(6)) (tbl.index>datetime.time(19)) & (tbl['weekday'] == 4)]
=> TypeError: type object 08:00:00
tbl['date'] = tbl.index
test = tbl[(tbl['date']>datetime.time(8)) & (tbl['weekday'] == 4)]
=> ValueError: Could not construct Timestamp from argument
What is wrong with my code?
the first bit filters the dataframe, the second bit returns boolean:
Try
test = (tbl[(tbl.ix[datetime.time(6):datetime.time(19)]).ix[tbl.weekday == 4)]
basically applies the 1st filter and then the second on top of it. Equivalent of an boolean and.
I suggest you use something like Ipython or its notebook to check the intermediate results of your functions to make sure that they are still as expected. Very difficult t write these expressions straight out of your head if you are not experienced with pandas syntax yet.
I found a solution now:
criterion1 = tbl.index.map(lambda i: i.hour >= 8)
criterion2 = tbl.index.map(lambda i: i.hour < 19)
criterion3 = (tbl['weekday'] == 4)
tbl[criterion1 & criterion2 & criterion3]
Is there something more elegant?
More elegant (@tobip) solution using boolean indexing.
# create index that gives array of row indices in the given time range
idx = tbl.index.indexer_between_time("8:00", "19:00", include_end=False)
# convert index array to boolean index
criterion1 = np.zeros(tbl.shape[0], dtype=bool)
criterion1[idx] = 1
# one more boolean index
criterion2 = (tbl['weekday'] == 4)
# combine boolean indices using logical and
tbl[criterion1 & criterion2]