Overlap of dates in pandas

Question

I have a dataframe as the following one :

Start	End	Unit	Nominal power	Unavailable power
2022-07-28 15:00:00	2022-09-23 17:00:00	A	485.0	0
2022-07-29 06:00:00	2022-09-23 17:00:00	A	485.0	0
2022-09-23 17:00:00	2022-10-12 17:00:00	A	485.0	0
2022-10-02 17:00:00	2023-01-13 17:00:00	A	485.0	0
2023-03-29 17:00:00	2022-03-30 05:00:00	A	485.0	150
2023-03-30 05:00:00	2023-04-02 17:00:00	A	485.0	290
2022-05-17 10:00:00	2022-05-26 10:00:00	B	512.0	0
2023-05-19 05:00:00	2023-05-24 17:00:00	B	512.0	0

I wrote a code to detect and drop the overlaps between start dates and end dates on different rows for each unit. But I can note achieve to include a criteria on unavailable power.

What I would like is the following one :

Start	End	Unit	Nominal power	Unavailable power
2022-07-28 15:00:00	2023-01-13 17:00:00	A	485.0	0
2023-03-29 17:00:00	2022-03-30 05:00:00	A	485.0	150
2023-03-30 05:00:00	2023-04-02 17:00:00	A	485.0	290
2022-05-17 10:00:00	2022-05-26 10:00:00	B	512.0	0

I wrote this code to succeed :

def overlapping(data):
    i=100000000
    j=0
    df = data.sort_values(['Unit', 'Start'])
    g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
    df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
    while i > j:
        i = len(df_out)
        df = data.sort_values(['Unit', 'Start'])
        g = df['Start'].gt(df['End'].shift()).groupby(df['Unit']).cumsum()
        df_out = df.groupby(['Unit', g], as_index=False).agg({'Start': 'min', 'End': 'max'})
        j = len(df_out) 
    return df_out

But unfortunately it doesn’t take into account the "Unavailable criteria". It gives the following table :

Start	End	Unit
2022-07-28 15:00:00	2023-01-13 17:00:00	A
2023-03-29 17:00:00	2023-04-02 17:00:00	A
2022-05-17 10:00:00	2023-05-24 10:00:00	B

Any idea of this?

Asked By: nicolax9777

||

Source

Answer 1

Check Below code, What is not clear is how in your output last row has END date as 2022-05-26 10:00:00 instead of 2023-05-24 17:00:00

df['Group_Col'] = (df['Unavailable power'].shift().bfill() != df['Unavailable power']).cumsum()

(df.groupby(['Unit','Nominal power','Group_Col','Unavailable power']).
 agg({'Start':'min','End':'max'}).
 reset_index()[['Start',    'End',  'Unit', 'Nominal power', 'Unavailable power']]
 )

Output:

Answered By: Abhishek

Overlap of dates in pandas

Question:

Answers: