Select all rows that contain that are at least a certain date
Question:
I have the following dataframe:
Code-
df = {'sample_received': {1: 'NaN',
2: 'NaN',
17: 'NaN',
3: 'NaN',
4: 'NaN',
5: 'NaN',
6: 'NaN',
7: 'NaN',
8: 'NaN',
9: 'NaN',
10: 'NaN',
11: 'NaN',
12: 'NaN',
13: 'NaN',
14: '2022-08-01 20:15:28',
15: '2022-08-01 20:12:56',
16: '2022-08-01 20:18:19'},
'result_received': {1: '2022-07-28 12:25:37',
2: '2022-07-30 12:37:37',
17: '2022-07-28 12:45:37',
3: '2022-07-28 12:48:37',
4: '2022-07-28 12:49:37',
5: '2022-07-28 12:50:37',
6: '2022-07-28 12:21:37',
7: '2022-07-28 12:52:37',
8: '2022-07-28 12:54:37',
9: '2022-08-01 11:55:40',
10: '2022-08-01 13:56:15',
11: '2022-08-01 13:57:15',
12: '2022-08-01 13:58:28',
13: '2022-08-01 13:59:28',
14: '2022-08-02 08:33:39',
15: '2022-08-02 08:35:39',
16: '2022-08-02 08::39'},
'status': {1: 'Failed',
2: 'Failed',
17: 'Approved',
3: 'Approved',
4: 'Approved',
5: 'Approved',
6: 'Approved',
7: 'Approved',
8: 'Approved',
9: 'Approved',
10: 'Approved',
11: 'Approved',
12: 'Approved',
13: 'Approved',
14: 'Approved',
15: 'Approved',
16: 'Approved'}}
pd.DataFrame(df)
I would like to select all the rows in which the sample_received
, or order_received
is at least on the 1st of august. What would be the most effective way to do this? The main problem is that it could occur that the ‘sample_received’ column can have a date that is not mentioned. However, when the ‘result_received’ column contains a date that is on the 1st of august (in this case) I want the dataframe to include this. Or the other way around.
Thank you in advance.
Answers:
This should do it.
cols = df.apply(lambda x: True if x.sample_received >= pd.Timestamp("2022-08-01") or x.order_received >= pd.Timestamp("2022-08-01") else False, axis=1)
df[cols]
I have the following dataframe:
Code-
df = {'sample_received': {1: 'NaN',
2: 'NaN',
17: 'NaN',
3: 'NaN',
4: 'NaN',
5: 'NaN',
6: 'NaN',
7: 'NaN',
8: 'NaN',
9: 'NaN',
10: 'NaN',
11: 'NaN',
12: 'NaN',
13: 'NaN',
14: '2022-08-01 20:15:28',
15: '2022-08-01 20:12:56',
16: '2022-08-01 20:18:19'},
'result_received': {1: '2022-07-28 12:25:37',
2: '2022-07-30 12:37:37',
17: '2022-07-28 12:45:37',
3: '2022-07-28 12:48:37',
4: '2022-07-28 12:49:37',
5: '2022-07-28 12:50:37',
6: '2022-07-28 12:21:37',
7: '2022-07-28 12:52:37',
8: '2022-07-28 12:54:37',
9: '2022-08-01 11:55:40',
10: '2022-08-01 13:56:15',
11: '2022-08-01 13:57:15',
12: '2022-08-01 13:58:28',
13: '2022-08-01 13:59:28',
14: '2022-08-02 08:33:39',
15: '2022-08-02 08:35:39',
16: '2022-08-02 08::39'},
'status': {1: 'Failed',
2: 'Failed',
17: 'Approved',
3: 'Approved',
4: 'Approved',
5: 'Approved',
6: 'Approved',
7: 'Approved',
8: 'Approved',
9: 'Approved',
10: 'Approved',
11: 'Approved',
12: 'Approved',
13: 'Approved',
14: 'Approved',
15: 'Approved',
16: 'Approved'}}
pd.DataFrame(df)
I would like to select all the rows in which the sample_received
, or order_received
is at least on the 1st of august. What would be the most effective way to do this? The main problem is that it could occur that the ‘sample_received’ column can have a date that is not mentioned. However, when the ‘result_received’ column contains a date that is on the 1st of august (in this case) I want the dataframe to include this. Or the other way around.
Thank you in advance.
This should do it.
cols = df.apply(lambda x: True if x.sample_received >= pd.Timestamp("2022-08-01") or x.order_received >= pd.Timestamp("2022-08-01") else False, axis=1)
df[cols]