Maintain a default of True/False for each date

Question:

During the day, new investment possibilities are registered, but the results (lay column) are only registered at midnight each day.

So let’s assume this CSV:

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,
2023/01/04,A,B,
2023/01/04,A,B,

Until yesterday, 2023/01/03, the sum of the lines that have the value A in competition and B in market_name, was +1.28

I only invest if it is above 0, so during today, every time this combination of values comes, the answer will be True to invest.

At the end of the day, when the lay values are registered, I look at the total result:

clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,-1
2023/01/04,A,B,-1
2023/01/04,A,B,-1

End of the day: -1,72

This means that tomorrow, if that same combination of values appears in the columns, I will not invest once because it will always be negative because it only calculates the values that it has until the previous day.

I’m trying to create a column to show where it was True and where it was False:

df = pd.read_csv('example.csv')
combinations = [['market_name', 'competition']]
for cbnt in combinations:
    df['invest'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                      .gt(df['lay'])
                   )

    df['cumulative'] = (df.groupby(cbnt)['lay']
                      .apply(lambda s: s.cumsum().shift())
                   )

    print(df[['clock_now','invest','cumulative']])

But the result is this:

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03   False        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -0.72

The expected result would be this:

    clock_now  invest  cumulative
0  2022/12/30   False         NaN
1  2022/12/31   False       -1.00
2  2023/01/01    True        0.28
3  2023/01/02   False       -0.72
4  2023/01/03    True        0.28
5  2023/01/04    True        1.28
6  2023/01/04    True        0.28
7  2023/01/04    True       -1.72

How should I proceed so that cumsum can understand that attention must be paid to maintaining a daily pattern according to the results of previous days?

Example Two:

clock_now,competition,market_name,lay
2022/08/09,A,B,-1.0
2022/08/12,A,B,1.28
2022/09/07,A,B,-1.0
2022/10/15,A,B,1.0
2022/10/15,A,B,-1.0
2022/11/20,A,B,1.0

Note that on 2022/10/15, it is delivering one False and one True, so in fact it is not tracking according to the date which is how I want it to happen:

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15    True        0.28
5  2022/11/20   False       -0.72

The correct would be always or all False or all True when on equal dates. Like this:

    clock_now  invest  cumulative
0  2022/08/09   False         NaN
1  2022/08/12   False       -1.00
2  2022/09/07    True        0.28
3  2022/10/15   False       -0.72
4  2022/10/15   False        0.28
5  2022/11/20   False       -0.72
Asked By: Digital Farmer

||

Answers:

(df.join(
    # Count market&competition specific cumsum for each row
    # and join back with df
    df.groupby(['market_name', 'competition']).lay.cumsum().rename('lay_cumsum') > 0
)
# Group by market&comp&date to get last cumsum within each day
.groupby(['market_name', 'competition', 'clock_now']) 
# Get cumsum Series for each group
.lay_cumsum
# Getting last cumsum within group
.last()
# Group by market&comp
.groupby(['market_name', 'competition'])
# Shift by one to assign to each date prev date's cumsum
.shift(1)
.rename('lay_cumsum')
.reset_index()
# Merge back with original df
.merge(df, on=['clock_now', 'market_name', 'competition']))

This will output

  market_name competition   clock_now lay_cumsum   lay
0           B           A  2022/12/30        NaN -1.00
1           B           A  2022/12/31      False  1.28
2           B           A  2023/01/01       True -1.00
3           B           A  2023/01/02      False  1.00
4           B           A  2023/01/03       True  1.00
5           B           A  2023/01/04       True -1.00
6           B           A  2023/01/04       True -1.00
7           B           A  2023/01/04       True -1.00
Answered By: Alex Bochkarev