Pandas to mark rows fit conditions

Question:

enter image description here

I have a simple dataset, and I want to mark the rows with below conditions, if:

  • both the columns Close and End are older than today

  • and, column Stage in [One, Two, Three]

  • and, column Project Number is not blank

I came up with below lines, but it doesn’t work.

import pandas as pd
import datetime
import numpy as np
from io import StringIO


csvfile = StringIO("""
ID  Stage   Close   Project Number  End"
A899    One 26/08/2019  KL1468  30/08/2019"
A572    Two 31/12/2020  KL1493  17/12/2019"
A778    Three   26/08/2019      16/08/2019"
A704    Four    31/12/2020  KL1036  01/12/2019"
A650    One 31/12/2020  KL1522  23/12/2019"
A830    Two 31/08/2021  KL1535  03/08/2021"
A669    Three   18/08/2021  KL1536  03/08/2021"
A892    Four    31/08/2021  KL1534  03/08/2021"
A789    One 31/05/2021  KL1537  04/08/2021"
A821    Two 31/12/2020  KL1578  03/11/2019"
A992    Three   29/07/2019  KL1609  26/06/2019"
A550    Four    31/12/2020  KL1243  30/11/2019"
A707    One 31/12/2020  KL1523  29/11/2019"
A740                
A917    Three   31/07/2021  KL1072  29/07/2021"
A627    Four    30/06/2021  KL1577  15/06/2021"

""")

df = pd.read_csv(csvfile, sep = 't', engine='python')

def condition_1(s):
    if (df['Project Number'].any() and s['Expiry_1'] < datetime.datetime.now() and s['Close_1'] < datetime.datetime.now() and np.where(df['Stage'].isin(['One','Two','Three']))):
        return "Overdue"
    else:
        return ''

df['Expiry_1'] = pd.to_datetime(df['End'].str[3:5] + '/' + df['End'].str[:2] + '/' + df['End'].str[-4:])
df['Close_1'] = pd.to_datetime(df['Close'].str[3:5] + '/' + df['Close'].str[:2] + '/' + df['Close'].str[-4:])
df["Overdue Project"] = df.apply(condition_1, axis=1)

df.to_excel("c:\Projects\output.xlsx", index = False)

What went wrong, and what’s the right way to achieve it?

Asked By: Mark K

||

Answers:

First convert columns to datetimes by to_datetime with format parameter:

df['Expiry_1'] = pd.to_datetime(df['End'], format='%d/%m/%Y')
df['Close_1'] = pd.to_datetime(df['Close'], format='%d/%m/%Y')

then test if less by DataFrame.lt or greater by
DataFrame.gt both columns and it need test if both are Trues use DataFrame.all for first mask m1, for second use Series.isin and for last Series.notna, last chain them by & for bitwise AND and pass to numpy.where:

m1 = df[['Expiry_1','Close_1']].lt(pd.to_datetime('now')).all(axis=1)
m2 = df['Stage'].isin(['One','Two','Three'])
m3 = df['Project Number'].notna()

df['Overdue Project'] = np.where(m1 & m2 & m3,  "Overdue", '')

print (df)
      ID  Stage       Close Project Number         End   Expiry_1    Close_1  
0   A899    One  26/08/2019         KL1468  30/08/2019 2019-08-30 2019-08-26   
1   A572    Two  31/12/2020         KL1493  17/12/2019 2019-12-17 2020-12-31   
2   A778  Three  26/08/2019            NaN  16/08/2019 2019-08-16 2019-08-26   
3   A704   Four  31/12/2020         KL1036  01/12/2019 2019-12-01 2020-12-31   
4   A650    One  31/12/2020         KL1522  23/12/2019 2019-12-23 2020-12-31   
5   A830    Two  31/08/2021         KL1535  03/08/2021 2021-08-03 2021-08-31   
6   A669  Three  18/08/2021         KL1536  03/08/2021 2021-08-03 2021-08-18   
7   A892   Four  31/08/2021         KL1534  03/08/2021 2021-08-03 2021-08-31   
8   A789    One  31/05/2021         KL1537  04/08/2021 2021-08-04 2021-05-31   
9   A821    Two  31/12/2020         KL1578  03/11/2019 2019-11-03 2020-12-31   
10  A992  Three  29/07/2019         KL1609  26/06/2019 2019-06-26 2019-07-29   
11  A550   Four  31/12/2020         KL1243  30/11/2019 2019-11-30 2020-12-31   
12  A707    One  31/12/2020         KL1523  29/11/2019 2019-11-29 2020-12-31   
13  A740    NaN         NaN            NaN         NaN        NaT        NaT   
14  A917  Three  31/07/2021         KL1072  29/07/2021 2021-07-29 2021-07-31   
15  A627   Four  30/06/2021         KL1577  15/06/2021 2021-06-15 2021-06-30   

   Overdue Project  
0          Overdue  
1          Overdue  
2                   
3                   
4          Overdue  
5                   
6                   
7                   
8                   
9          Overdue  
10         Overdue  
11                  
12         Overdue  
13                  
14                  
15       
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.