Pandas to mark rows fit conditions
Question:
I have a simple dataset, and I want to mark the rows with below conditions, if:
-
both the columns Close and End are older than today
-
and, column Stage in [One, Two, Three]
-
and, column Project Number is not blank
I came up with below lines, but it doesn’t work.
import pandas as pd
import datetime
import numpy as np
from io import StringIO
csvfile = StringIO("""
ID Stage Close Project Number End"
A899 One 26/08/2019 KL1468 30/08/2019"
A572 Two 31/12/2020 KL1493 17/12/2019"
A778 Three 26/08/2019 16/08/2019"
A704 Four 31/12/2020 KL1036 01/12/2019"
A650 One 31/12/2020 KL1522 23/12/2019"
A830 Two 31/08/2021 KL1535 03/08/2021"
A669 Three 18/08/2021 KL1536 03/08/2021"
A892 Four 31/08/2021 KL1534 03/08/2021"
A789 One 31/05/2021 KL1537 04/08/2021"
A821 Two 31/12/2020 KL1578 03/11/2019"
A992 Three 29/07/2019 KL1609 26/06/2019"
A550 Four 31/12/2020 KL1243 30/11/2019"
A707 One 31/12/2020 KL1523 29/11/2019"
A740
A917 Three 31/07/2021 KL1072 29/07/2021"
A627 Four 30/06/2021 KL1577 15/06/2021"
""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
def condition_1(s):
if (df['Project Number'].any() and s['Expiry_1'] < datetime.datetime.now() and s['Close_1'] < datetime.datetime.now() and np.where(df['Stage'].isin(['One','Two','Three']))):
return "Overdue"
else:
return ''
df['Expiry_1'] = pd.to_datetime(df['End'].str[3:5] + '/' + df['End'].str[:2] + '/' + df['End'].str[-4:])
df['Close_1'] = pd.to_datetime(df['Close'].str[3:5] + '/' + df['Close'].str[:2] + '/' + df['Close'].str[-4:])
df["Overdue Project"] = df.apply(condition_1, axis=1)
df.to_excel("c:\Projects\output.xlsx", index = False)
What went wrong, and what’s the right way to achieve it?
Answers:
First convert columns to datetimes by to_datetime
with format parameter:
df['Expiry_1'] = pd.to_datetime(df['End'], format='%d/%m/%Y')
df['Close_1'] = pd.to_datetime(df['Close'], format='%d/%m/%Y')
then test if less by DataFrame.lt
or greater by
DataFrame.gt
both columns and it need test if both are True
s use DataFrame.all
for first mask m1
, for second use Series.isin
and for last Series.notna
, last chain them by &
for bitwise AND
and pass to numpy.where
:
m1 = df[['Expiry_1','Close_1']].lt(pd.to_datetime('now')).all(axis=1)
m2 = df['Stage'].isin(['One','Two','Three'])
m3 = df['Project Number'].notna()
df['Overdue Project'] = np.where(m1 & m2 & m3, "Overdue", '')
print (df)
ID Stage Close Project Number End Expiry_1 Close_1
0 A899 One 26/08/2019 KL1468 30/08/2019 2019-08-30 2019-08-26
1 A572 Two 31/12/2020 KL1493 17/12/2019 2019-12-17 2020-12-31
2 A778 Three 26/08/2019 NaN 16/08/2019 2019-08-16 2019-08-26
3 A704 Four 31/12/2020 KL1036 01/12/2019 2019-12-01 2020-12-31
4 A650 One 31/12/2020 KL1522 23/12/2019 2019-12-23 2020-12-31
5 A830 Two 31/08/2021 KL1535 03/08/2021 2021-08-03 2021-08-31
6 A669 Three 18/08/2021 KL1536 03/08/2021 2021-08-03 2021-08-18
7 A892 Four 31/08/2021 KL1534 03/08/2021 2021-08-03 2021-08-31
8 A789 One 31/05/2021 KL1537 04/08/2021 2021-08-04 2021-05-31
9 A821 Two 31/12/2020 KL1578 03/11/2019 2019-11-03 2020-12-31
10 A992 Three 29/07/2019 KL1609 26/06/2019 2019-06-26 2019-07-29
11 A550 Four 31/12/2020 KL1243 30/11/2019 2019-11-30 2020-12-31
12 A707 One 31/12/2020 KL1523 29/11/2019 2019-11-29 2020-12-31
13 A740 NaN NaN NaN NaN NaT NaT
14 A917 Three 31/07/2021 KL1072 29/07/2021 2021-07-29 2021-07-31
15 A627 Four 30/06/2021 KL1577 15/06/2021 2021-06-15 2021-06-30
Overdue Project
0 Overdue
1 Overdue
2
3
4 Overdue
5
6
7
8
9 Overdue
10 Overdue
11
12 Overdue
13
14
15
I have a simple dataset, and I want to mark the rows with below conditions, if:
-
both the columns Close and End are older than today
-
and, column Stage in [One, Two, Three]
-
and, column Project Number is not blank
I came up with below lines, but it doesn’t work.
import pandas as pd
import datetime
import numpy as np
from io import StringIO
csvfile = StringIO("""
ID Stage Close Project Number End"
A899 One 26/08/2019 KL1468 30/08/2019"
A572 Two 31/12/2020 KL1493 17/12/2019"
A778 Three 26/08/2019 16/08/2019"
A704 Four 31/12/2020 KL1036 01/12/2019"
A650 One 31/12/2020 KL1522 23/12/2019"
A830 Two 31/08/2021 KL1535 03/08/2021"
A669 Three 18/08/2021 KL1536 03/08/2021"
A892 Four 31/08/2021 KL1534 03/08/2021"
A789 One 31/05/2021 KL1537 04/08/2021"
A821 Two 31/12/2020 KL1578 03/11/2019"
A992 Three 29/07/2019 KL1609 26/06/2019"
A550 Four 31/12/2020 KL1243 30/11/2019"
A707 One 31/12/2020 KL1523 29/11/2019"
A740
A917 Three 31/07/2021 KL1072 29/07/2021"
A627 Four 30/06/2021 KL1577 15/06/2021"
""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
def condition_1(s):
if (df['Project Number'].any() and s['Expiry_1'] < datetime.datetime.now() and s['Close_1'] < datetime.datetime.now() and np.where(df['Stage'].isin(['One','Two','Three']))):
return "Overdue"
else:
return ''
df['Expiry_1'] = pd.to_datetime(df['End'].str[3:5] + '/' + df['End'].str[:2] + '/' + df['End'].str[-4:])
df['Close_1'] = pd.to_datetime(df['Close'].str[3:5] + '/' + df['Close'].str[:2] + '/' + df['Close'].str[-4:])
df["Overdue Project"] = df.apply(condition_1, axis=1)
df.to_excel("c:\Projects\output.xlsx", index = False)
What went wrong, and what’s the right way to achieve it?
First convert columns to datetimes by to_datetime
with format parameter:
df['Expiry_1'] = pd.to_datetime(df['End'], format='%d/%m/%Y')
df['Close_1'] = pd.to_datetime(df['Close'], format='%d/%m/%Y')
then test if less by DataFrame.lt
or greater by
DataFrame.gt
both columns and it need test if both are True
s use DataFrame.all
for first mask m1
, for second use Series.isin
and for last Series.notna
, last chain them by &
for bitwise AND
and pass to numpy.where
:
m1 = df[['Expiry_1','Close_1']].lt(pd.to_datetime('now')).all(axis=1)
m2 = df['Stage'].isin(['One','Two','Three'])
m3 = df['Project Number'].notna()
df['Overdue Project'] = np.where(m1 & m2 & m3, "Overdue", '')
print (df)
ID Stage Close Project Number End Expiry_1 Close_1
0 A899 One 26/08/2019 KL1468 30/08/2019 2019-08-30 2019-08-26
1 A572 Two 31/12/2020 KL1493 17/12/2019 2019-12-17 2020-12-31
2 A778 Three 26/08/2019 NaN 16/08/2019 2019-08-16 2019-08-26
3 A704 Four 31/12/2020 KL1036 01/12/2019 2019-12-01 2020-12-31
4 A650 One 31/12/2020 KL1522 23/12/2019 2019-12-23 2020-12-31
5 A830 Two 31/08/2021 KL1535 03/08/2021 2021-08-03 2021-08-31
6 A669 Three 18/08/2021 KL1536 03/08/2021 2021-08-03 2021-08-18
7 A892 Four 31/08/2021 KL1534 03/08/2021 2021-08-03 2021-08-31
8 A789 One 31/05/2021 KL1537 04/08/2021 2021-08-04 2021-05-31
9 A821 Two 31/12/2020 KL1578 03/11/2019 2019-11-03 2020-12-31
10 A992 Three 29/07/2019 KL1609 26/06/2019 2019-06-26 2019-07-29
11 A550 Four 31/12/2020 KL1243 30/11/2019 2019-11-30 2020-12-31
12 A707 One 31/12/2020 KL1523 29/11/2019 2019-11-29 2020-12-31
13 A740 NaN NaN NaN NaN NaT NaT
14 A917 Three 31/07/2021 KL1072 29/07/2021 2021-07-29 2021-07-31
15 A627 Four 30/06/2021 KL1577 15/06/2021 2021-06-15 2021-06-30
Overdue Project
0 Overdue
1 Overdue
2
3
4 Overdue
5
6
7
8
9 Overdue
10 Overdue
11
12 Overdue
13
14
15