Calculate overlapping times in weekly room schedule

Question:

I have a DataFrame which contains some room scheduling data.

Here is a sample of the data for the first few entries for Thursday and Friday morning:

   DAYS BEGIN_TIME END_TIME
0    R      09:00    10:15
1    R      08:30    09:45
2    R      11:30    12:20
3    R      11:30    12:45
4    F      08:00    10:30
5    F      07:00    08:15
6    F      08:00    10:30

As a python defintion:

df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
                   'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
                   'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
                  )

R represents Thursday and F represents Friday. There are also M, T, and W in this column.

BEGIN_TIME and END_TIME represent the start and end time for someone to be using the room, in hours and minutes, in 24-hour notation, HH:MM.

I would like to determine on which days, and times the room has collisions (multiple people trying to use the room at the same time).

For the sample data, I’d like to receive something like:

    DAYS BEGIN_TIME END_TIME   USERS
0    R      08:30     9:00       1
1    R      09:00     9:45       2
2    R      09:45    10:15       1
3    R      11:30    12:20       2
4    R      12:20    12:45       1
5    F      07:00     8:00       1
6    F      08:00    08:15       3
7    F      08:15    10:30       2

So far, in my research I found this answer to Count overlapping time frames in a pandas dataframe, grouped by person.

import pandas as pd

df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
                   'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
                   'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
                  )

# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")

# Code from linked SO Answer
ends = df['BEGIN_TIME'].values < df['END_TIME'].values[:, None]
starts = df['BEGIN_TIME'].values > df['BEGIN_TIME'].values[:, None]
same_group = (df['DAYS'].values == df['DAYS'].values[:, None])
df['OVERLAP'] = (ends & starts & same_group).sum(1)

print(df)

And while this does tell me about certain collisions, it doesn’t help when trying to find specifically what times have conflict.

I also looked through Pandas: Count time interval intersections over a group by but the answers here also just looked at counting overlaps, not breaking out ranges into specific overlapping times.

I don’t know where to go from here, can someone point me in the right direction?

Asked By: Henry Ecker

||

Answers:

A psuedo-code algorithm for this, given that you have a relatively small dataset, can be something like this.

collisions = dict()  # {room: [(collision_start, collision_end)]}
for each reservation R1:
    for each other reservation R2 where R2.room=R1.room:
        if R2.end_time > R1.start_time and R2.start_time < R1.end_time:
            # COLLISION... you need to edit code below to make sure key exists
            collisions[R].append((start_of_collision, end_of_collision))

To determine start_of_collision and end_of_collision will take a bit more work, since you need to check for 3 cases.

  • case 1: R1 starts before R2 starts, and ends before R2 ends.

    (start_of_collision, end_of_collision) = (R2.start, R1.end)
    
  • case 2: R1 starts after R2 starts, and ends before R2 ends.

    (start_of_collision, end_of_collision) = (R1.start, R1.end)
    
  • case 3: R1 starts after R2 starts, and ends after R2 ends.

    (start_of_collision, end_of_collision) = (R1.start, R2.end)
    
Answered By: Frank Bryce

It seems to me the right way to do this is to reformat your data so you have a series of events (either ‘start’ or ‘end’), with a single date column. You can then sort by the timestamp, and do a simple counter:

import pandas as pd

df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
                   'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
                   'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
                  )

days="MTWRF"

# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")

# Convert to a more useful format.

newdata = []
for row in df.iterrows():
    row = row[1]
    newdata.append((
        row["DAYS"],
        "start",
        row["BEGIN_TIME"]
    ))
    newdata.append((
        row["DAYS"],
        "end",
        row["END_TIME"]
    ))
newdata.sort(key=lambda r: (days.index(r[0]),r[2]))
print(newdata)

count = 0
for row in newdata:
    if row[1] == 'start':
        count += 1
    else:
        count -= 1
    print( row[0], row[2].strftime("%H:%M"), count )

The output isn’t exactly what you wanted, but hopefully you can see how to get there from here.

R 08:30 1   
R 09:00 2   
R 09:45 1   
R 10:15 0   
R 11:30 1   
R 11:30 2   
R 12:20 1   
R 12:45 0   
F 07:00 1   
F 08:00 2   
F 08:00 3   
F 08:15 2   
F 10:30 1   
F 10:30 0   
Answered By: Tim Roberts

Create a DataFrame of all 15 minute intervals for each day (cadence of appointments). Then we can use Numpy’s broadcasting features to see how many users are using the room at a given timem for each day.

import pandas as pd
import numpy as np

# Convert your times to a numeric type. 
for col in ['BEGIN_TIME', 'END_TIME']:
    df[col] = pd.to_datetime(df[col])
    df[col] = df[col] - df[col].dt.normalize()

# 15-min blocks Monday-Friday
df1 = (pd.concat([pd.DataFrame({'Time': pd.timedelta_range('00:00:00', '23:45:00', freq='15min')})]*5,
                 keys=list('MTWRF'), names=['Days', 'to_drop'])
         .reset_index()
         .drop(columns='to_drop'))
    
# For each day determine the overlap
l = []
for day, gp in df1.groupby('Days'):
    gp['users'] = ((gp['Time'].to_numpy() >= df.loc[df.DAYS.eq(day), 'BEGIN_TIME'].to_numpy()[:, None])
                     & (gp['Time'].to_numpy() <= df.loc[df.DAYS.eq(day), 'END_TIME'].to_numpy()[:, None])).sum(axis=0)
    l.append(gp['users'])

# Join the results back to our 15 minute skeleton
df1 = pd.concat([df1, pd.concat(l)], axis=1)

Now we can, for example, check and see the times on Thursday:

df1.loc[df1.Days.eq('R') & df1.Time.between('07:00:00', '14:00:00')]

    Days            Time  users
316    R 0 days 07:00:00      0
317    R 0 days 07:15:00      0
318    R 0 days 07:30:00      0
319    R 0 days 07:45:00      0
320    R 0 days 08:00:00      0
321    R 0 days 08:15:00      0
322    R 0 days 08:30:00      1
323    R 0 days 08:45:00      1
324    R 0 days 09:00:00      2
325    R 0 days 09:15:00      2
326    R 0 days 09:30:00      2
327    R 0 days 09:45:00      2
328    R 0 days 10:00:00      1
329    R 0 days 10:15:00      1
330    R 0 days 10:30:00      0
331    R 0 days 10:45:00      0
332    R 0 days 11:00:00      0
333    R 0 days 11:15:00      0
334    R 0 days 11:30:00      2
335    R 0 days 11:45:00      2
336    R 0 days 12:00:00      2
337    R 0 days 12:15:00      2
338    R 0 days 12:30:00      1
339    R 0 days 12:45:00      1
340    R 0 days 13:00:00      0
341    R 0 days 13:15:00      0
342    R 0 days 13:30:00      0
343    R 0 days 13:45:00      0
344    R 0 days 14:00:00      0
Answered By: ALollz