Calculate overlapping times in weekly room schedule
Question:
I have a DataFrame which contains some room scheduling data.
Here is a sample of the data for the first few entries for Thursday and Friday morning:
DAYS BEGIN_TIME END_TIME
0 R 09:00 10:15
1 R 08:30 09:45
2 R 11:30 12:20
3 R 11:30 12:45
4 F 08:00 10:30
5 F 07:00 08:15
6 F 08:00 10:30
As a python defintion:
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
R
represents Thursday and F
represents Friday. There are also M
, T
, and W
in this column.
BEGIN_TIME
and END_TIME
represent the start and end time for someone to be using the room, in hours and minutes, in 24-hour notation, HH:MM.
I would like to determine on which days, and times the room has collisions (multiple people trying to use the room at the same time).
For the sample data, I’d like to receive something like:
DAYS BEGIN_TIME END_TIME USERS
0 R 08:30 9:00 1
1 R 09:00 9:45 2
2 R 09:45 10:15 1
3 R 11:30 12:20 2
4 R 12:20 12:45 1
5 F 07:00 8:00 1
6 F 08:00 08:15 3
7 F 08:15 10:30 2
So far, in my research I found this answer to Count overlapping time frames in a pandas dataframe, grouped by person.
import pandas as pd
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")
# Code from linked SO Answer
ends = df['BEGIN_TIME'].values < df['END_TIME'].values[:, None]
starts = df['BEGIN_TIME'].values > df['BEGIN_TIME'].values[:, None]
same_group = (df['DAYS'].values == df['DAYS'].values[:, None])
df['OVERLAP'] = (ends & starts & same_group).sum(1)
print(df)
And while this does tell me about certain collisions, it doesn’t help when trying to find specifically what times have conflict.
I also looked through Pandas: Count time interval intersections over a group by but the answers here also just looked at counting overlaps, not breaking out ranges into specific overlapping times.
I don’t know where to go from here, can someone point me in the right direction?
Answers:
A psuedo-code algorithm for this, given that you have a relatively small dataset, can be something like this.
collisions = dict() # {room: [(collision_start, collision_end)]}
for each reservation R1:
for each other reservation R2 where R2.room=R1.room:
if R2.end_time > R1.start_time and R2.start_time < R1.end_time:
# COLLISION... you need to edit code below to make sure key exists
collisions[R].append((start_of_collision, end_of_collision))
To determine start_of_collision
and end_of_collision
will take a bit more work, since you need to check for 3 cases.
-
case 1: R1
starts before R2
starts, and ends before R2
ends.
(start_of_collision, end_of_collision) = (R2.start, R1.end)
-
case 2: R1
starts after R2
starts, and ends before R2
ends.
(start_of_collision, end_of_collision) = (R1.start, R1.end)
-
case 3: R1
starts after R2
starts, and ends after R2
ends.
(start_of_collision, end_of_collision) = (R1.start, R2.end)
It seems to me the right way to do this is to reformat your data so you have a series of events (either ‘start’ or ‘end’), with a single date column. You can then sort by the timestamp, and do a simple counter:
import pandas as pd
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
days="MTWRF"
# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")
# Convert to a more useful format.
newdata = []
for row in df.iterrows():
row = row[1]
newdata.append((
row["DAYS"],
"start",
row["BEGIN_TIME"]
))
newdata.append((
row["DAYS"],
"end",
row["END_TIME"]
))
newdata.sort(key=lambda r: (days.index(r[0]),r[2]))
print(newdata)
count = 0
for row in newdata:
if row[1] == 'start':
count += 1
else:
count -= 1
print( row[0], row[2].strftime("%H:%M"), count )
The output isn’t exactly what you wanted, but hopefully you can see how to get there from here.
R 08:30 1
R 09:00 2
R 09:45 1
R 10:15 0
R 11:30 1
R 11:30 2
R 12:20 1
R 12:45 0
F 07:00 1
F 08:00 2
F 08:00 3
F 08:15 2
F 10:30 1
F 10:30 0
Create a DataFrame
of all 15 minute intervals for each day (cadence of appointments). Then we can use Numpy’s broadcasting features to see how many users are using the room at a given timem for each day.
import pandas as pd
import numpy as np
# Convert your times to a numeric type.
for col in ['BEGIN_TIME', 'END_TIME']:
df[col] = pd.to_datetime(df[col])
df[col] = df[col] - df[col].dt.normalize()
# 15-min blocks Monday-Friday
df1 = (pd.concat([pd.DataFrame({'Time': pd.timedelta_range('00:00:00', '23:45:00', freq='15min')})]*5,
keys=list('MTWRF'), names=['Days', 'to_drop'])
.reset_index()
.drop(columns='to_drop'))
# For each day determine the overlap
l = []
for day, gp in df1.groupby('Days'):
gp['users'] = ((gp['Time'].to_numpy() >= df.loc[df.DAYS.eq(day), 'BEGIN_TIME'].to_numpy()[:, None])
& (gp['Time'].to_numpy() <= df.loc[df.DAYS.eq(day), 'END_TIME'].to_numpy()[:, None])).sum(axis=0)
l.append(gp['users'])
# Join the results back to our 15 minute skeleton
df1 = pd.concat([df1, pd.concat(l)], axis=1)
Now we can, for example, check and see the times on Thursday:
df1.loc[df1.Days.eq('R') & df1.Time.between('07:00:00', '14:00:00')]
Days Time users
316 R 0 days 07:00:00 0
317 R 0 days 07:15:00 0
318 R 0 days 07:30:00 0
319 R 0 days 07:45:00 0
320 R 0 days 08:00:00 0
321 R 0 days 08:15:00 0
322 R 0 days 08:30:00 1
323 R 0 days 08:45:00 1
324 R 0 days 09:00:00 2
325 R 0 days 09:15:00 2
326 R 0 days 09:30:00 2
327 R 0 days 09:45:00 2
328 R 0 days 10:00:00 1
329 R 0 days 10:15:00 1
330 R 0 days 10:30:00 0
331 R 0 days 10:45:00 0
332 R 0 days 11:00:00 0
333 R 0 days 11:15:00 0
334 R 0 days 11:30:00 2
335 R 0 days 11:45:00 2
336 R 0 days 12:00:00 2
337 R 0 days 12:15:00 2
338 R 0 days 12:30:00 1
339 R 0 days 12:45:00 1
340 R 0 days 13:00:00 0
341 R 0 days 13:15:00 0
342 R 0 days 13:30:00 0
343 R 0 days 13:45:00 0
344 R 0 days 14:00:00 0
I have a DataFrame which contains some room scheduling data.
Here is a sample of the data for the first few entries for Thursday and Friday morning:
DAYS BEGIN_TIME END_TIME
0 R 09:00 10:15
1 R 08:30 09:45
2 R 11:30 12:20
3 R 11:30 12:45
4 F 08:00 10:30
5 F 07:00 08:15
6 F 08:00 10:30
As a python defintion:
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
R
represents Thursday and F
represents Friday. There are also M
, T
, and W
in this column.
BEGIN_TIME
and END_TIME
represent the start and end time for someone to be using the room, in hours and minutes, in 24-hour notation, HH:MM.
I would like to determine on which days, and times the room has collisions (multiple people trying to use the room at the same time).
For the sample data, I’d like to receive something like:
DAYS BEGIN_TIME END_TIME USERS
0 R 08:30 9:00 1
1 R 09:00 9:45 2
2 R 09:45 10:15 1
3 R 11:30 12:20 2
4 R 12:20 12:45 1
5 F 07:00 8:00 1
6 F 08:00 08:15 3
7 F 08:15 10:30 2
So far, in my research I found this answer to Count overlapping time frames in a pandas dataframe, grouped by person.
import pandas as pd
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")
# Code from linked SO Answer
ends = df['BEGIN_TIME'].values < df['END_TIME'].values[:, None]
starts = df['BEGIN_TIME'].values > df['BEGIN_TIME'].values[:, None]
same_group = (df['DAYS'].values == df['DAYS'].values[:, None])
df['OVERLAP'] = (ends & starts & same_group).sum(1)
print(df)
And while this does tell me about certain collisions, it doesn’t help when trying to find specifically what times have conflict.
I also looked through Pandas: Count time interval intersections over a group by but the answers here also just looked at counting overlaps, not breaking out ranges into specific overlapping times.
I don’t know where to go from here, can someone point me in the right direction?
A psuedo-code algorithm for this, given that you have a relatively small dataset, can be something like this.
collisions = dict() # {room: [(collision_start, collision_end)]}
for each reservation R1:
for each other reservation R2 where R2.room=R1.room:
if R2.end_time > R1.start_time and R2.start_time < R1.end_time:
# COLLISION... you need to edit code below to make sure key exists
collisions[R].append((start_of_collision, end_of_collision))
To determine start_of_collision
and end_of_collision
will take a bit more work, since you need to check for 3 cases.
-
case 1:
R1
starts beforeR2
starts, and ends beforeR2
ends.(start_of_collision, end_of_collision) = (R2.start, R1.end)
-
case 2:
R1
starts afterR2
starts, and ends beforeR2
ends.(start_of_collision, end_of_collision) = (R1.start, R1.end)
-
case 3:
R1
starts afterR2
starts, and ends afterR2
ends.(start_of_collision, end_of_collision) = (R1.start, R2.end)
It seems to me the right way to do this is to reformat your data so you have a series of events (either ‘start’ or ‘end’), with a single date column. You can then sort by the timestamp, and do a simple counter:
import pandas as pd
df = pd.DataFrame({'DAYS': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'F', 5: 'F', 6: 'F'},
'BEGIN_TIME': {0: '09:00', 1: '08:30', 2: '11:30', 3: '11:30', 4: '08:00', 5: '07:00', 6: '08:00'},
'END_TIME': {0: '10:15', 1: '09:45', 2: '12:20', 3: '12:45', 4: '10:30', 5: '08:15', 6: '10:30'}}
)
days="MTWRF"
# Convert to DateTime
df["BEGIN_TIME"] = df["BEGIN_TIME"].astype("datetime64[ns]")
df["END_TIME"] = df["END_TIME"].astype("datetime64[ns]")
# Convert to a more useful format.
newdata = []
for row in df.iterrows():
row = row[1]
newdata.append((
row["DAYS"],
"start",
row["BEGIN_TIME"]
))
newdata.append((
row["DAYS"],
"end",
row["END_TIME"]
))
newdata.sort(key=lambda r: (days.index(r[0]),r[2]))
print(newdata)
count = 0
for row in newdata:
if row[1] == 'start':
count += 1
else:
count -= 1
print( row[0], row[2].strftime("%H:%M"), count )
The output isn’t exactly what you wanted, but hopefully you can see how to get there from here.
R 08:30 1
R 09:00 2
R 09:45 1
R 10:15 0
R 11:30 1
R 11:30 2
R 12:20 1
R 12:45 0
F 07:00 1
F 08:00 2
F 08:00 3
F 08:15 2
F 10:30 1
F 10:30 0
Create a DataFrame
of all 15 minute intervals for each day (cadence of appointments). Then we can use Numpy’s broadcasting features to see how many users are using the room at a given timem for each day.
import pandas as pd
import numpy as np
# Convert your times to a numeric type.
for col in ['BEGIN_TIME', 'END_TIME']:
df[col] = pd.to_datetime(df[col])
df[col] = df[col] - df[col].dt.normalize()
# 15-min blocks Monday-Friday
df1 = (pd.concat([pd.DataFrame({'Time': pd.timedelta_range('00:00:00', '23:45:00', freq='15min')})]*5,
keys=list('MTWRF'), names=['Days', 'to_drop'])
.reset_index()
.drop(columns='to_drop'))
# For each day determine the overlap
l = []
for day, gp in df1.groupby('Days'):
gp['users'] = ((gp['Time'].to_numpy() >= df.loc[df.DAYS.eq(day), 'BEGIN_TIME'].to_numpy()[:, None])
& (gp['Time'].to_numpy() <= df.loc[df.DAYS.eq(day), 'END_TIME'].to_numpy()[:, None])).sum(axis=0)
l.append(gp['users'])
# Join the results back to our 15 minute skeleton
df1 = pd.concat([df1, pd.concat(l)], axis=1)
Now we can, for example, check and see the times on Thursday:
df1.loc[df1.Days.eq('R') & df1.Time.between('07:00:00', '14:00:00')]
Days Time users
316 R 0 days 07:00:00 0
317 R 0 days 07:15:00 0
318 R 0 days 07:30:00 0
319 R 0 days 07:45:00 0
320 R 0 days 08:00:00 0
321 R 0 days 08:15:00 0
322 R 0 days 08:30:00 1
323 R 0 days 08:45:00 1
324 R 0 days 09:00:00 2
325 R 0 days 09:15:00 2
326 R 0 days 09:30:00 2
327 R 0 days 09:45:00 2
328 R 0 days 10:00:00 1
329 R 0 days 10:15:00 1
330 R 0 days 10:30:00 0
331 R 0 days 10:45:00 0
332 R 0 days 11:00:00 0
333 R 0 days 11:15:00 0
334 R 0 days 11:30:00 2
335 R 0 days 11:45:00 2
336 R 0 days 12:00:00 2
337 R 0 days 12:15:00 2
338 R 0 days 12:30:00 1
339 R 0 days 12:45:00 1
340 R 0 days 13:00:00 0
341 R 0 days 13:15:00 0
342 R 0 days 13:30:00 0
343 R 0 days 13:45:00 0
344 R 0 days 14:00:00 0