Pandas rebuild free time slots from all possible time slots and booked time slots
Question:
As per the title there’s a list of available/possible time slots and then there’s a list of booked slots. What I need a helping hand with, is a streamlined way, using pandas, to extract booked time slots from the possible ones and rebuild the free time slots data frame.
Thank you very much
#1 Possible slots:
>>> df1
start end
0 2023-02-28 08:00:00 2023-02-28 08:30:00
1 2023-02-28 08:30:00 2023-02-28 09:00:00
2 2023-02-28 09:00:00 2023-02-28 09:30:00
3 2023-02-28 09:30:00 2023-02-28 10:00:00
4 2023-02-28 10:00:00 2023-02-28 10:30:00
5 2023-02-28 10:30:00 2023-02-28 11:00:00
6 2023-02-28 11:00:00 2023-02-28 11:30:00
7 2023-02-28 11:30:00 2023-02-28 12:00:00
8 2023-02-28 12:00:00 2023-02-28 12:30:00
9 2023-02-28 12:30:00 2023-02-28 13:00:00
10 2023-02-28 13:00:00 2023-02-28 13:30:00
11 2023-02-28 13:30:00 2023-02-28 14:00:00
12 2023-02-28 14:00:00 2023-02-28 14:30:00
13 2023-02-28 14:30:00 2023-02-28 15:00:00
14 2023-02-28 15:00:00 2023-02-28 15:30:00
15 2023-02-28 15:30:00 2023-02-28 16:00:00
>>>
#2 Booked slots:
>>> df2
start end
0 2023-02-28 08:00:00 2023-02-28 08:15:00
1 2023-02-28 08:15:00 2023-02-28 08:30:00
2 2023-02-28 09:00:00 2023-02-28 09:30:00
3 2023-02-28 12:00:00 2023-02-28 12:45:00
4 2023-02-28 13:15:00 2023-02-28 14:45:00
>>>
#3 The result should be:
>>> df3
start end
0 2023-02-28 08:30:00 2023-02-28 09:00:00
1 2023-02-28 09:30:00 2023-02-28 10:00:00
2 2023-02-28 10:00:00 2023-02-28 10:30:00
3 2023-02-28 10:30:00 2023-02-28 11:00:00
4 2023-02-28 11:00:00 2023-02-28 11:30:00
5 2023-02-28 11:30:00 2023-02-28 12:00:00
6 2023-02-28 12:45:00 2023-02-28 13:00:00
7 2023-02-28 13:00:00 2023-02-28 13:15:00
8 2023-02-28 14:45:00 2023-02-28 15:00:00
9 2023-02-28 15:00:00 2023-02-28 15:30:00
10 2023-02-28 15:30:00 2023-02-28 16:00:00
Answers:
You can repeat values in 5 minutes intervals by difference of both columns start/end
in both DataFrames, create helper column new
by add 5 minutes timedeltas by GroupBy.cumcount
to start
column and filter in Series.isin
with boolean indexing
, last aggregate to first and last value (or min and max values) with add 5 minutes to end
column:
#5 minutes intervals, change if necessary
N = 5
df1['start'] = pd.to_datetime(df1['start'])
df1['end'] = pd.to_datetime(df1['end'])
df1 = df1.loc[df1.index.repeat(df1['end'].sub(df1['start']).dt.total_seconds().div(N*60))]
counter1 = df1.groupby(level=0).cumcount()
df1['new'] = df1['start'].add(pd.to_timedelta(counter1, unit='Min').mul(N))
df2['start'] = pd.to_datetime(df2['start'])
df2['end'] = pd.to_datetime(df2['end'])
df2 = df2.loc[df2.index.repeat(df2['end'].sub(df2['start']).dt.total_seconds().div(N*60))]
counter2 = df2.groupby(level=0).cumcount()
df2['new'] = df2['start'].add(pd.to_timedelta(counter2, unit='Min').mul(N))
df3 = (df1[~df1['new'].isin(df2['new'])].groupby(level=0).agg(start=('new', 'first'),
end=('new', 'last'))
.assign(end = lambda x: x['end'].add(pd.Timedelta(f'{N}Min')))
.reset_index(drop=True))
print (df3)
start end
0 2023-02-28 08:30:00 2023-02-28 09:00:00
1 2023-02-28 09:30:00 2023-02-28 10:00:00
2 2023-02-28 10:00:00 2023-02-28 10:30:00
3 2023-02-28 10:30:00 2023-02-28 11:00:00
4 2023-02-28 11:00:00 2023-02-28 11:30:00
5 2023-02-28 11:30:00 2023-02-28 12:00:00
6 2023-02-28 12:45:00 2023-02-28 13:00:00
7 2023-02-28 13:00:00 2023-02-28 13:15:00
8 2023-02-28 14:45:00 2023-02-28 15:00:00
9 2023-02-28 15:00:00 2023-02-28 15:30:00
10 2023-02-28 15:30:00 2023-02-28 16:00:00
You can use the pd.date_range() function to generate all possible time slots, and then use the pd.IntervalIndex() function to create an interval index that represents these time slots. Then you can use the pd.IntervalIndex.difference() method to remove the booked time slots from the interval index and get the free time slots.
Here’s an example implementation:
import pandas as pd
# Create DataFrame of possible time slots
possible_slots = pd.DataFrame({
'start': pd.date_range(start='2023-02-28 08:00:00', end='2023-02-28 16:00:00', freq='30min'),
'end': pd.date_range(start='2023-02-28 08:30:00', end='2023-02-28 16:00:00', freq='30min')
})
# Create DataFrame of booked time slots
booked_slots = pd.DataFrame({
'start': ['2023-02-28 08:00:00', '2023-02-28 08:15:00', '2023-02-28 08:15:00', '2023-02-28 09:00:00',
'2023-02-28 09:30:00', '2023-02-28 12:00:00', '2023-02-28 12:45:00', '2023-02-28 13:15:00'],
'end': ['2023-02-28 08:15:00', '2023-02-28 08:30:00', '2023-02-28 08:30:00', '2023-02-28 09:30:00',
'2023-02-28 10:00:00', '2023-02-28 12:45:00', '2023-02-28 13:00:00', '2023-02-28 14:45:00']
})
booked_slots['start'] = pd.to_datetime(booked_slots['start'])
booked_slots['end'] = pd.to_datetime(booked_slots['end'])
# Create interval index of possible time slots
possible_intervals = pd.IntervalIndex.from_arrays(possible_slots['start'], possible_slots['end'])
# Create interval index of booked time slots
booked_intervals = pd.IntervalIndex.from_arrays(booked_slots['start'], booked_slots['end'])
# Get free intervals by removing booked intervals from possible intervals
free_intervals = possible_intervals.difference(booked_intervals)
# Convert free intervals back to DataFrame
free_slots = pd.DataFrame({'start': free_intervals.left, 'end': free_intervals.right})
print(free_slots)
It’s actually quite simple. A time slot is used if either:
- the start of a booked slot is ≥ the start of the slot and < its end
- the end of a booked slot is ≤ the end of the slot and > its start
Use a merge_asof
with boolean indexing:
# ensure datetime
df1[['start', 'end']] = df1[['start', 'end']].apply(pd.to_datetime)
df2[['start', 'end']] = df2[['start', 'end']].apply(pd.to_datetime)
# merge and filter:
out = (
pd.merge_asof(df1, df2.add_suffix('2'), left_on='start', right_on='start2')
.loc[lambda d: ~((d['start2'].ge(d['start']) & d['start2'].lt(d['end']))
|(d['end2'].le(d['end']) & d['end2'].gt(d['start']))
),
['start', 'end']]
)
Output:
start end
1 2023-02-28 08:30:00 2023-02-28 09:00:00
3 2023-02-28 09:30:00 2023-02-28 10:00:00
4 2023-02-28 10:00:00 2023-02-28 10:30:00
5 2023-02-28 10:30:00 2023-02-28 11:00:00
6 2023-02-28 11:00:00 2023-02-28 11:30:00
7 2023-02-28 11:30:00 2023-02-28 12:00:00
10 2023-02-28 13:00:00 2023-02-28 13:30:00
11 2023-02-28 13:30:00 2023-02-28 14:00:00
12 2023-02-28 14:00:00 2023-02-28 14:30:00
14 2023-02-28 15:00:00 2023-02-28 15:30:00
15 2023-02-28 15:30:00 2023-02-28 16:00:00
Intermediates:
start end start2 end2 start2_in_slot end2_in_slot available
0 2023-02-28 08:00:00 2023-02-28 08:30:00 2023-02-28 08:00:00 2023-02-28 08:15:00 True True False
1 2023-02-28 08:30:00 2023-02-28 09:00:00 2023-02-28 08:15:00 2023-02-28 08:30:00 False False True
2 2023-02-28 09:00:00 2023-02-28 09:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 True True False
3 2023-02-28 09:30:00 2023-02-28 10:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
4 2023-02-28 10:00:00 2023-02-28 10:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
5 2023-02-28 10:30:00 2023-02-28 11:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
6 2023-02-28 11:00:00 2023-02-28 11:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
7 2023-02-28 11:30:00 2023-02-28 12:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
8 2023-02-28 12:00:00 2023-02-28 12:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00 True False False
9 2023-02-28 12:30:00 2023-02-28 13:00:00 2023-02-28 12:00:00 2023-02-28 12:45:00 False True True
10 2023-02-28 13:00:00 2023-02-28 13:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00 False False True
11 2023-02-28 13:30:00 2023-02-28 14:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
12 2023-02-28 14:00:00 2023-02-28 14:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
13 2023-02-28 14:30:00 2023-02-28 15:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False True True
14 2023-02-28 15:00:00 2023-02-28 15:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
15 2023-02-28 15:30:00 2023-02-28 16:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
As per the title there’s a list of available/possible time slots and then there’s a list of booked slots. What I need a helping hand with, is a streamlined way, using pandas, to extract booked time slots from the possible ones and rebuild the free time slots data frame.
Thank you very much
#1 Possible slots:
>>> df1
start end
0 2023-02-28 08:00:00 2023-02-28 08:30:00
1 2023-02-28 08:30:00 2023-02-28 09:00:00
2 2023-02-28 09:00:00 2023-02-28 09:30:00
3 2023-02-28 09:30:00 2023-02-28 10:00:00
4 2023-02-28 10:00:00 2023-02-28 10:30:00
5 2023-02-28 10:30:00 2023-02-28 11:00:00
6 2023-02-28 11:00:00 2023-02-28 11:30:00
7 2023-02-28 11:30:00 2023-02-28 12:00:00
8 2023-02-28 12:00:00 2023-02-28 12:30:00
9 2023-02-28 12:30:00 2023-02-28 13:00:00
10 2023-02-28 13:00:00 2023-02-28 13:30:00
11 2023-02-28 13:30:00 2023-02-28 14:00:00
12 2023-02-28 14:00:00 2023-02-28 14:30:00
13 2023-02-28 14:30:00 2023-02-28 15:00:00
14 2023-02-28 15:00:00 2023-02-28 15:30:00
15 2023-02-28 15:30:00 2023-02-28 16:00:00
>>>
#2 Booked slots:
>>> df2
start end
0 2023-02-28 08:00:00 2023-02-28 08:15:00
1 2023-02-28 08:15:00 2023-02-28 08:30:00
2 2023-02-28 09:00:00 2023-02-28 09:30:00
3 2023-02-28 12:00:00 2023-02-28 12:45:00
4 2023-02-28 13:15:00 2023-02-28 14:45:00
>>>
#3 The result should be:
>>> df3
start end
0 2023-02-28 08:30:00 2023-02-28 09:00:00
1 2023-02-28 09:30:00 2023-02-28 10:00:00
2 2023-02-28 10:00:00 2023-02-28 10:30:00
3 2023-02-28 10:30:00 2023-02-28 11:00:00
4 2023-02-28 11:00:00 2023-02-28 11:30:00
5 2023-02-28 11:30:00 2023-02-28 12:00:00
6 2023-02-28 12:45:00 2023-02-28 13:00:00
7 2023-02-28 13:00:00 2023-02-28 13:15:00
8 2023-02-28 14:45:00 2023-02-28 15:00:00
9 2023-02-28 15:00:00 2023-02-28 15:30:00
10 2023-02-28 15:30:00 2023-02-28 16:00:00
You can repeat values in 5 minutes intervals by difference of both columns start/end
in both DataFrames, create helper column new
by add 5 minutes timedeltas by GroupBy.cumcount
to start
column and filter in Series.isin
with boolean indexing
, last aggregate to first and last value (or min and max values) with add 5 minutes to end
column:
#5 minutes intervals, change if necessary
N = 5
df1['start'] = pd.to_datetime(df1['start'])
df1['end'] = pd.to_datetime(df1['end'])
df1 = df1.loc[df1.index.repeat(df1['end'].sub(df1['start']).dt.total_seconds().div(N*60))]
counter1 = df1.groupby(level=0).cumcount()
df1['new'] = df1['start'].add(pd.to_timedelta(counter1, unit='Min').mul(N))
df2['start'] = pd.to_datetime(df2['start'])
df2['end'] = pd.to_datetime(df2['end'])
df2 = df2.loc[df2.index.repeat(df2['end'].sub(df2['start']).dt.total_seconds().div(N*60))]
counter2 = df2.groupby(level=0).cumcount()
df2['new'] = df2['start'].add(pd.to_timedelta(counter2, unit='Min').mul(N))
df3 = (df1[~df1['new'].isin(df2['new'])].groupby(level=0).agg(start=('new', 'first'),
end=('new', 'last'))
.assign(end = lambda x: x['end'].add(pd.Timedelta(f'{N}Min')))
.reset_index(drop=True))
print (df3)
start end
0 2023-02-28 08:30:00 2023-02-28 09:00:00
1 2023-02-28 09:30:00 2023-02-28 10:00:00
2 2023-02-28 10:00:00 2023-02-28 10:30:00
3 2023-02-28 10:30:00 2023-02-28 11:00:00
4 2023-02-28 11:00:00 2023-02-28 11:30:00
5 2023-02-28 11:30:00 2023-02-28 12:00:00
6 2023-02-28 12:45:00 2023-02-28 13:00:00
7 2023-02-28 13:00:00 2023-02-28 13:15:00
8 2023-02-28 14:45:00 2023-02-28 15:00:00
9 2023-02-28 15:00:00 2023-02-28 15:30:00
10 2023-02-28 15:30:00 2023-02-28 16:00:00
You can use the pd.date_range() function to generate all possible time slots, and then use the pd.IntervalIndex() function to create an interval index that represents these time slots. Then you can use the pd.IntervalIndex.difference() method to remove the booked time slots from the interval index and get the free time slots.
Here’s an example implementation:
import pandas as pd
# Create DataFrame of possible time slots
possible_slots = pd.DataFrame({
'start': pd.date_range(start='2023-02-28 08:00:00', end='2023-02-28 16:00:00', freq='30min'),
'end': pd.date_range(start='2023-02-28 08:30:00', end='2023-02-28 16:00:00', freq='30min')
})
# Create DataFrame of booked time slots
booked_slots = pd.DataFrame({
'start': ['2023-02-28 08:00:00', '2023-02-28 08:15:00', '2023-02-28 08:15:00', '2023-02-28 09:00:00',
'2023-02-28 09:30:00', '2023-02-28 12:00:00', '2023-02-28 12:45:00', '2023-02-28 13:15:00'],
'end': ['2023-02-28 08:15:00', '2023-02-28 08:30:00', '2023-02-28 08:30:00', '2023-02-28 09:30:00',
'2023-02-28 10:00:00', '2023-02-28 12:45:00', '2023-02-28 13:00:00', '2023-02-28 14:45:00']
})
booked_slots['start'] = pd.to_datetime(booked_slots['start'])
booked_slots['end'] = pd.to_datetime(booked_slots['end'])
# Create interval index of possible time slots
possible_intervals = pd.IntervalIndex.from_arrays(possible_slots['start'], possible_slots['end'])
# Create interval index of booked time slots
booked_intervals = pd.IntervalIndex.from_arrays(booked_slots['start'], booked_slots['end'])
# Get free intervals by removing booked intervals from possible intervals
free_intervals = possible_intervals.difference(booked_intervals)
# Convert free intervals back to DataFrame
free_slots = pd.DataFrame({'start': free_intervals.left, 'end': free_intervals.right})
print(free_slots)
It’s actually quite simple. A time slot is used if either:
- the start of a booked slot is ≥ the start of the slot and < its end
- the end of a booked slot is ≤ the end of the slot and > its start
Use a merge_asof
with boolean indexing:
# ensure datetime
df1[['start', 'end']] = df1[['start', 'end']].apply(pd.to_datetime)
df2[['start', 'end']] = df2[['start', 'end']].apply(pd.to_datetime)
# merge and filter:
out = (
pd.merge_asof(df1, df2.add_suffix('2'), left_on='start', right_on='start2')
.loc[lambda d: ~((d['start2'].ge(d['start']) & d['start2'].lt(d['end']))
|(d['end2'].le(d['end']) & d['end2'].gt(d['start']))
),
['start', 'end']]
)
Output:
start end
1 2023-02-28 08:30:00 2023-02-28 09:00:00
3 2023-02-28 09:30:00 2023-02-28 10:00:00
4 2023-02-28 10:00:00 2023-02-28 10:30:00
5 2023-02-28 10:30:00 2023-02-28 11:00:00
6 2023-02-28 11:00:00 2023-02-28 11:30:00
7 2023-02-28 11:30:00 2023-02-28 12:00:00
10 2023-02-28 13:00:00 2023-02-28 13:30:00
11 2023-02-28 13:30:00 2023-02-28 14:00:00
12 2023-02-28 14:00:00 2023-02-28 14:30:00
14 2023-02-28 15:00:00 2023-02-28 15:30:00
15 2023-02-28 15:30:00 2023-02-28 16:00:00
Intermediates:
start end start2 end2 start2_in_slot end2_in_slot available
0 2023-02-28 08:00:00 2023-02-28 08:30:00 2023-02-28 08:00:00 2023-02-28 08:15:00 True True False
1 2023-02-28 08:30:00 2023-02-28 09:00:00 2023-02-28 08:15:00 2023-02-28 08:30:00 False False True
2 2023-02-28 09:00:00 2023-02-28 09:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 True True False
3 2023-02-28 09:30:00 2023-02-28 10:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
4 2023-02-28 10:00:00 2023-02-28 10:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
5 2023-02-28 10:30:00 2023-02-28 11:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
6 2023-02-28 11:00:00 2023-02-28 11:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
7 2023-02-28 11:30:00 2023-02-28 12:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00 False False True
8 2023-02-28 12:00:00 2023-02-28 12:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00 True False False
9 2023-02-28 12:30:00 2023-02-28 13:00:00 2023-02-28 12:00:00 2023-02-28 12:45:00 False True True
10 2023-02-28 13:00:00 2023-02-28 13:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00 False False True
11 2023-02-28 13:30:00 2023-02-28 14:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
12 2023-02-28 14:00:00 2023-02-28 14:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
13 2023-02-28 14:30:00 2023-02-28 15:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False True True
14 2023-02-28 15:00:00 2023-02-28 15:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True
15 2023-02-28 15:30:00 2023-02-28 16:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00 False False True