Pandas update previous records because future peaking is not possible

Question:

This is what I have so far:

import numpy as np
import pandas_ta as ta
from pandas import DataFrame, pandas

df = pandas.DataFrame({"color": [None, None, 'blue', None, None, None, 'orange', None, None, None, None],
                       'bottom': [1, 2, 7, 5, 9, 9, 5, 4, 5, 5, 3],
                       'top': [5, 5, 11, 8, 10, 10, 9, 7, 10, 6, 7]})

print(df)

"""
     color  down  top
0     None     1    5
1     None     2    5
2     blue     7   11
3     None     5    8
4     None     9   10
5     None     9   10
6   orange     5    9
7     None     4    7
8     None     5   10
9     None     5    6
10    None     3    7
"""

# lookback period
N = 3

# Pivot each color to own column and shift
df2 = (df.pivot(columns='color', values=['top', 'bottom'])
         .drop(columns=np.nan, level=1)
         .ffill(limit=N-1).shift()
       )


# compare current top with bottom & top from color occurance
out = df.join((df2['bottom'].le(df['top'], axis=0)
               & df2['top'].ge(df['top'], axis=0)).astype(int))
print(out)


"""
     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     1       0
5     None       9   10     1       0
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       1
10    None       3    7     0       0
"""

Question:

I only want to consume each color once. That means that for every blue or orange occurrence there can only be only one 1 in the upcoming 3 rows.
( 2 blues after each other will result in two 1s. One 1 for every blue.)

"""
     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     1       0 --> this should be 0, blue already consumed on row 3
5     None       9   10     1       0 --> this should be 0, blue already consumed on row 3
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       1 --> this should be 0, orange already consumed on row 7
10    None       3    7     0       0
"""

One bottleneck is that for this to function correctly I am not allowed to peak in to the future. So I am not allowed to use .shift(-3) or iloc[-1] for example.

That sort of kills my initial thinking about keeping track of a consumed state by using something like .rolling(-3).max() == 1 .

Asked By: Florian

||

Answers:

You can post-process the output to only keep the first 1 per group:

# lookback period
N = 3

# Pivot each color to own column and shift
df2 = (df.pivot(columns='color', values=['top', 'bottom'])
         .drop(columns=np.nan, level=1)
         .ffill(limit=N-1).shift()
       )

# compare current top with bottom & top from color occurance
out = df.join((df2['bottom'].le(df['top'], axis=0)
               & df2['top'].ge(df['top'], axis=0)).astype(int))

# post process the output to keep only the first 1
cols = list(df['color'].dropna().unique())

out[cols] = out[cols].mask(out[cols].ne(out.groupby(df['color'].notna().cumsum())[cols].cumsum()), 0)

Or with a loop:

cols = list(df['color'].dropna().unique())

g = out.groupby(df['color'].notna().cumsum())
for c in cols:
    out[c] = np.where(out[c].eq(1) & df.index.isin(g[c].idxmax()), 1, 0)

Output:

     color  bottom  top  blue  orange
0     None       1    5     0       0
1     None       2    5     0       0
2     blue       7   11     0       0
3     None       5    8     1       0
4     None       9   10     0       0
5     None       9   10     0       0
6   orange       5    9     0       0
7     None       4    7     0       1
8     None       5   10     0       0
9     None       5    6     0       0
10    None       3    7     0       0
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.