Pandas rebuild free time slots from all possible time slots and booked time slots

Question:

As per the title there’s a list of available/possible time slots and then there’s a list of booked slots. What I need a helping hand with, is a streamlined way, using pandas, to extract booked time slots from the possible ones and rebuild the free time slots data frame.
Thank you very much

#1 Possible slots:

>>> df1
                  start                  end
0   2023-02-28 08:00:00  2023-02-28 08:30:00
1   2023-02-28 08:30:00  2023-02-28 09:00:00
2   2023-02-28 09:00:00  2023-02-28 09:30:00
3   2023-02-28 09:30:00  2023-02-28 10:00:00
4   2023-02-28 10:00:00  2023-02-28 10:30:00
5   2023-02-28 10:30:00  2023-02-28 11:00:00
6   2023-02-28 11:00:00  2023-02-28 11:30:00
7   2023-02-28 11:30:00  2023-02-28 12:00:00
8   2023-02-28 12:00:00  2023-02-28 12:30:00
9   2023-02-28 12:30:00  2023-02-28 13:00:00
10  2023-02-28 13:00:00  2023-02-28 13:30:00
11  2023-02-28 13:30:00  2023-02-28 14:00:00
12  2023-02-28 14:00:00  2023-02-28 14:30:00
13  2023-02-28 14:30:00  2023-02-28 15:00:00
14  2023-02-28 15:00:00  2023-02-28 15:30:00
15  2023-02-28 15:30:00  2023-02-28 16:00:00
>>> 

#2 Booked slots:

>>> df2
                 start                  end
0  2023-02-28 08:00:00  2023-02-28 08:15:00
1  2023-02-28 08:15:00  2023-02-28 08:30:00
2  2023-02-28 09:00:00  2023-02-28 09:30:00
3  2023-02-28 12:00:00  2023-02-28 12:45:00
4  2023-02-28 13:15:00  2023-02-28 14:45:00
>>> 

#3 The result should be:

>>> df3
                  start                  end
0   2023-02-28 08:30:00  2023-02-28 09:00:00
1   2023-02-28 09:30:00  2023-02-28 10:00:00
2   2023-02-28 10:00:00  2023-02-28 10:30:00
3   2023-02-28 10:30:00  2023-02-28 11:00:00
4   2023-02-28 11:00:00  2023-02-28 11:30:00
5   2023-02-28 11:30:00  2023-02-28 12:00:00
6   2023-02-28 12:45:00  2023-02-28 13:00:00
7   2023-02-28 13:00:00  2023-02-28 13:15:00
8   2023-02-28 14:45:00  2023-02-28 15:00:00
9   2023-02-28 15:00:00  2023-02-28 15:30:00
10  2023-02-28 15:30:00  2023-02-28 16:00:00
Asked By: MBPeregrine

||

Answers:

You can repeat values in 5 minutes intervals by difference of both columns start/end in both DataFrames, create helper column new by add 5 minutes timedeltas by GroupBy.cumcount to start column and filter in Series.isin with boolean indexing, last aggregate to first and last value (or min and max values) with add 5 minutes to end column:

#5 minutes intervals, change if necessary
N = 5
df1['start'] = pd.to_datetime(df1['start'])
df1['end'] = pd.to_datetime(df1['end'])

df1 = df1.loc[df1.index.repeat(df1['end'].sub(df1['start']).dt.total_seconds().div(N*60))]
counter1 = df1.groupby(level=0).cumcount()
df1['new'] = df1['start'].add(pd.to_timedelta(counter1, unit='Min').mul(N))

df2['start'] = pd.to_datetime(df2['start'])
df2['end'] = pd.to_datetime(df2['end'])

df2 = df2.loc[df2.index.repeat(df2['end'].sub(df2['start']).dt.total_seconds().div(N*60))]
counter2 = df2.groupby(level=0).cumcount()
df2['new'] = df2['start'].add(pd.to_timedelta(counter2, unit='Min').mul(N))

df3 = (df1[~df1['new'].isin(df2['new'])].groupby(level=0).agg(start=('new', 'first'),
                                                              end=('new', 'last'))
           .assign(end = lambda x: x['end'].add(pd.Timedelta(f'{N}Min')))
           .reset_index(drop=True))
print (df3)
                 start                 end
0  2023-02-28 08:30:00 2023-02-28 09:00:00
1  2023-02-28 09:30:00 2023-02-28 10:00:00
2  2023-02-28 10:00:00 2023-02-28 10:30:00
3  2023-02-28 10:30:00 2023-02-28 11:00:00
4  2023-02-28 11:00:00 2023-02-28 11:30:00
5  2023-02-28 11:30:00 2023-02-28 12:00:00
6  2023-02-28 12:45:00 2023-02-28 13:00:00
7  2023-02-28 13:00:00 2023-02-28 13:15:00
8  2023-02-28 14:45:00 2023-02-28 15:00:00
9  2023-02-28 15:00:00 2023-02-28 15:30:00
10 2023-02-28 15:30:00 2023-02-28 16:00:00
Answered By: jezrael

You can use the pd.date_range() function to generate all possible time slots, and then use the pd.IntervalIndex() function to create an interval index that represents these time slots. Then you can use the pd.IntervalIndex.difference() method to remove the booked time slots from the interval index and get the free time slots.

Here’s an example implementation:

import pandas as pd

# Create DataFrame of possible time slots
possible_slots = pd.DataFrame({
    'start': pd.date_range(start='2023-02-28 08:00:00', end='2023-02-28 16:00:00', freq='30min'),
    'end': pd.date_range(start='2023-02-28 08:30:00', end='2023-02-28 16:00:00', freq='30min')
})

# Create DataFrame of booked time slots
booked_slots = pd.DataFrame({
    'start': ['2023-02-28 08:00:00', '2023-02-28 08:15:00', '2023-02-28 08:15:00', '2023-02-28 09:00:00', 
              '2023-02-28 09:30:00', '2023-02-28 12:00:00', '2023-02-28 12:45:00', '2023-02-28 13:15:00'],
    'end':   ['2023-02-28 08:15:00', '2023-02-28 08:30:00', '2023-02-28 08:30:00', '2023-02-28 09:30:00',
              '2023-02-28 10:00:00', '2023-02-28 12:45:00', '2023-02-28 13:00:00', '2023-02-28 14:45:00']
})

booked_slots['start'] = pd.to_datetime(booked_slots['start'])
booked_slots['end'] = pd.to_datetime(booked_slots['end'])

# Create interval index of possible time slots
possible_intervals = pd.IntervalIndex.from_arrays(possible_slots['start'], possible_slots['end'])

# Create interval index of booked time slots
booked_intervals = pd.IntervalIndex.from_arrays(booked_slots['start'], booked_slots['end'])

# Get free intervals by removing booked intervals from possible intervals
free_intervals = possible_intervals.difference(booked_intervals)

# Convert free intervals back to DataFrame
free_slots = pd.DataFrame({'start': free_intervals.left, 'end': free_intervals.right})

print(free_slots)

It’s actually quite simple. A time slot is used if either:

  • the start of a booked slot is ≥ the start of the slot and < its end
  • the end of a booked slot is ≤ the end of the slot and > its start

Use a merge_asof with boolean indexing:

# ensure datetime
df1[['start', 'end']] = df1[['start', 'end']].apply(pd.to_datetime)
df2[['start', 'end']] = df2[['start', 'end']].apply(pd.to_datetime)

# merge and filter:
out = (
 pd.merge_asof(df1, df2.add_suffix('2'), left_on='start', right_on='start2')
   .loc[lambda d: ~((d['start2'].ge(d['start']) & d['start2'].lt(d['end']))
                   |(d['end2'].le(d['end']) & d['end2'].gt(d['start']))
                   ),
        ['start', 'end']]
)

Output:

                 start                 end
1  2023-02-28 08:30:00 2023-02-28 09:00:00
3  2023-02-28 09:30:00 2023-02-28 10:00:00
4  2023-02-28 10:00:00 2023-02-28 10:30:00
5  2023-02-28 10:30:00 2023-02-28 11:00:00
6  2023-02-28 11:00:00 2023-02-28 11:30:00
7  2023-02-28 11:30:00 2023-02-28 12:00:00
10 2023-02-28 13:00:00 2023-02-28 13:30:00
11 2023-02-28 13:30:00 2023-02-28 14:00:00
12 2023-02-28 14:00:00 2023-02-28 14:30:00
14 2023-02-28 15:00:00 2023-02-28 15:30:00
15 2023-02-28 15:30:00 2023-02-28 16:00:00

Intermediates:

                 start                 end              start2                end2  start2_in_slot  end2_in_slot  available
0  2023-02-28 08:00:00 2023-02-28 08:30:00 2023-02-28 08:00:00 2023-02-28 08:15:00            True          True      False
1  2023-02-28 08:30:00 2023-02-28 09:00:00 2023-02-28 08:15:00 2023-02-28 08:30:00           False         False       True
2  2023-02-28 09:00:00 2023-02-28 09:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00            True          True      False
3  2023-02-28 09:30:00 2023-02-28 10:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00           False         False       True
4  2023-02-28 10:00:00 2023-02-28 10:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00           False         False       True
5  2023-02-28 10:30:00 2023-02-28 11:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00           False         False       True
6  2023-02-28 11:00:00 2023-02-28 11:30:00 2023-02-28 09:00:00 2023-02-28 09:30:00           False         False       True
7  2023-02-28 11:30:00 2023-02-28 12:00:00 2023-02-28 09:00:00 2023-02-28 09:30:00           False         False       True
8  2023-02-28 12:00:00 2023-02-28 12:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00            True         False      False
9  2023-02-28 12:30:00 2023-02-28 13:00:00 2023-02-28 12:00:00 2023-02-28 12:45:00           False          True       True
10 2023-02-28 13:00:00 2023-02-28 13:30:00 2023-02-28 12:00:00 2023-02-28 12:45:00           False         False       True
11 2023-02-28 13:30:00 2023-02-28 14:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00           False         False       True
12 2023-02-28 14:00:00 2023-02-28 14:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00           False         False       True
13 2023-02-28 14:30:00 2023-02-28 15:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00           False          True       True
14 2023-02-28 15:00:00 2023-02-28 15:30:00 2023-02-28 13:15:00 2023-02-28 14:45:00           False         False       True
15 2023-02-28 15:30:00 2023-02-28 16:00:00 2023-02-28 13:15:00 2023-02-28 14:45:00           False         False       True
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.