Pandas: assign a datetime by condition

Question:

I have to assign a given datetime value into timestamp column to rows with NaT values if a condition is met in another column. All the values in timestamp are either datetime64[ns] or NaT.

EDIT:

sample data:

dates = [pd.to_datetime('2022-10-14 10:13:52', format = "%Y-%m-%d %H:%M:%S"), 
         pd.to_datetime('2022-10-14 17:43:52', format = "%Y-%m-%d %H:%M:%S"),
         pd.to_datetime('2022-10-14 09:00:10', format = "%Y-%m-%d %H:%M:%S")]
data = {'A': [-0.5, -0.5, 0.7, 1, 0.65, 0.5], 'timestamp': pd.Series(dates, index=[1, 3, 5])}
df = pd.DataFrame(data = data, index=[0, 1, 2, 3, 4, 5])

Output:

      A           timestamp
0 -0.50                 NaT
1 -0.50 2022-10-14 10:13:52
2  0.70                 NaT
3  1.00 2022-10-14 17:43:52
4  0.65                 NaT
5  0.50 2022-10-14 09:00:10

I then do the following:

threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
df.timestamp = np.where(df.A >= threshold, null_date, df.timestamp)

However, this turns all the values in timestamp into object type.

      A            timestamp
0 -0.50                 None
1 -0.50  1665742432000000000
2  0.70  2022-09-01 09:00:00
3  1.00  2022-09-01 09:00:00
4  0.65  2022-09-01 09:00:00
5  0.50  1665738010000000000

That is, NaTs in rows where the condition is not met are replaced with None. Datetime in those rows are replaced, too. Only the rows where the condition was met get a datetime.

Does anyone have any suggestions how to replace NaT with a given datetime by condition?

EDIT-2:

It worked out with lambda function:

df.timestamp = df[['A', 'timestamp']].apply(lambda x: null_date if x['A'] >= threshold else x['timestamp'], axis=1)

Output:

      A           timestamp
0 -0.50                 NaT
1 -0.50 2022-10-14 10:13:52
2  0.70 2022-09-01 09:00:00
3  1.00 2022-09-01 09:00:00
4  0.65 2022-09-01 09:00:00
5  0.50 2022-10-14 09:00:10
Asked By: Elena Savinova

||

Answers:

The type of the null_date value does not match the type of the df[‘timestamp’] value. both must be datetime64. use this:

threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
null_date = np.datetime64(null_date)
df['timestamp'] = np.where(df['A'] >= threshold, null_date, df['timestamp'])

Answered By: Clegane

i think this will work:

import pandas as pd

dates = [pd.to_datetime('2022-10-14 10:13:52', format = "%Y-%m-%d %H:%M:%S"), 
         pd.to_datetime('2022-10-14 17:43:52', format = "%Y-%m-%d %H:%M:%S"),
         pd.to_datetime('2022-10-14 09:00:10', format = "%Y-%m-%d %H:%M:%S")]
data = {'A': [-0.5, -0.5, 0.7, 1, 0.65, 0.5], 'timestamp': pd.Series(dates, index=[1, 3, 5])}
df = pd.DataFrame(data = data, index=[0, 1, 2, 3, 4, 5])

threshold = 0.65
null_date = pd.to_datetime('2022-09-01 09:00:00', format = "%Y-%m-%d %H:%M:%S")
#df.timestamp = np.where(df.A >= threshold, null_date, df.timestamp)
df.loc[df.A >= threshold, 'timestamp'] = null_date


>>> df
      A           timestamp
0 -0.50                 NaT
1 -0.50 2022-10-14 10:13:52
2  0.70 2022-09-01 09:00:00
3  1.00 2022-09-01 09:00:00
4  0.65 2022-09-01 09:00:00
5  0.50 2022-10-14 09:00:10
>>> 
Answered By: misterhuge
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.