Conditional Shift in Pandas [calculate last win date]

Question:

Help with conditional shifting in Pandas

I have the following dataframe:

rng = pd.date_range('2015-02-24', periods=10, freq='D')
win_list = ['Won', 'Lost', 'Won', 'Won', 'Lost', 'Won', 'Lost', 'Lost', 'Lost', 'Won']
cust_list = ['A', 'A', 'B', 'C', 'C', 'C', 'B', 'A', 'A', 'B']
output = ['2015-02-24', '2015-02-24', '2015-02-26', '2015-02-27', '2015-02-27', 
          '2015-03-01', '2015-02-26', '2015-02-24', '2015-02-24', '2015-03-05']
df = pd.DataFrame({'TodayDate': rng, 'Status': win_list, 'Customer': cust_list, 'Last Win Datetime' : output})
df

which looks like

TodayDate   Status  Customer    Last Win Datetime
2015-02-24  Won     A           2015-02-24
2015-02-25  Lost    A           2015-02-24
2015-02-26  Won     B           2015-02-26
2015-02-27  Won     C           2015-02-27
2015-02-28  Lost    C           2015-02-27
2015-03-01  Won     C           2015-03-01
2015-03-02  Lost    B           2015-02-26
2015-03-03  Lost    A           2015-02-24
2015-03-04  Lost    A           2015-02-24
2015-03-05  Won     B           2015-03-05

I am trying to compute the ‘Last Win Datetime’ column. This column should return the most recent TodayDate when a Customer had a Won status.

I have tried the solution posted Pandas and Python Dataframes and Conditional Shift Function but this seems to return the most recent date for a matching Status (or car type from the example).

Any help is much appreciated!

Asked By: idealunique

||

Answers:

Use Series.where for replace rows with no Won to missing values, so possible use GroupBy.ffill:

df['Last Win Datetime'] = (df['TodayDate'].where(df['Status'].eq('Won'))
                                          .groupby(df['Customer'])
                                          .ffill())
print(df)
   TodayDate Status Customer Last Win Datetime
0 2015-02-24    Won        A        2015-02-24
1 2015-02-25   Lost        A        2015-02-24
2 2015-02-26    Won        B        2015-02-26
3 2015-02-27    Won        C        2015-02-27
4 2015-02-28   Lost        C        2015-02-27
5 2015-03-01    Won        C        2015-03-01
6 2015-03-02   Lost        B        2015-02-26
7 2015-03-03   Lost        A        2015-02-24
8 2015-03-04   Lost        A        2015-02-24
9 2015-03-05    Won        B        2015-03-05

Details:

print(df['TodayDate'].where(df['Status'].eq('Won')))
0   2015-02-24
1          NaT
2   2015-02-26
3   2015-02-27
4          NaT
5   2015-03-01
6          NaT
7          NaT
8          NaT
9   2015-03-05
Name: TodayDate, dtype: datetime64[ns]
Answered By: jezrael