Collect values by internal numbering while taking into account the seconds between them

Question

I have a Data Frame that looks like this:

Answers,all_answers,Score
1.0,1,1
0.0,0,1
0.0,0,2
0.0,0,3
-1.0,1,1
0.0,0,1
0.0,0,2
1.0,1,1
-1.0,1,2
0.0,0,1
1.0,1,1

Each line is a second

The answers column means with which sign the calculation was made at this second: 1 is a positive number -1 is a negative number

The All_Answers column is the answers column modulo that is, without negative signs. It was made for the third column of the internal Score account. Score counts how many times the value 0 or 1 went in a row.

Data Frame can consist of 5000+ rows

And the internal account column of the value 0 can reach an average of 300 values

I want to collect the values of All_answers in a new column with the condition that between them the value 0 did not exceed 5.While observing the Answers sign.

For example

Answers,all_answers,Score,New
1.0,1,1,1
0.0,0,1,0
0.0,0,2,0
0.0,0,3,0
0.0,0,4,0
0.0,0,5,0
0.0,0,6,0
-1.0,1,1,0
0.0,0,1,0
0.0,0,2,0
1.0,1,1,1
-1.0,1,2,-1
0.0,0,1,0
1.0,1,1,1

There may be such values:

Answers,all_answers,Score,New
1.0,1,1,1
0.0,0,1,0
-1.0,1,1,-1
0.0,0,1,0
1.0,1,1,1
0.0,0,1,0
-1.0,1,1,-1
0.0,0,1,0
1.0,1,1,1
0.0,0,1,0
-1.0,1,1,-1
0.0,0,1,0
0.0,0,2,0
1.0,1,1,1

And it can be

Answers,all_answers,Score,New
0.0,0,19,0
0.0,0,20,0
0.0,0,21,0
0.0,0,22,0
0.0,0,23,0
0.0,0,24,0
0.0,0,25,0
0.0,0,26,0
-1.0,1,1,0
0.0,0,1,0
0.0,0,2,0
0.0,0,3,0
0.0,0,4,0
0.0,0,5,0
0.0,0,6,0
1.0,1,1,0

And very rarely can this happen. But this is exactly the aggregate I’m trying to find and I want to see it on the chart.

Answers,all_answers,Score,New
1.0,1,1,0
0.0,0,1,0
0.0,0,2,0
0.0,0,3,0
0.0,0,4,0
0.0,0,5,0
-1.0,1,1,-1
1.0,1,2,1
-1.0,1,3,-1
1.0,1,4,1
-1.0,1,5,-1
0.0,0,1
0.0,0,2
0.0,0,3
0.0,0,4
1.0,1,1,1
0.0,0,1
0.0,0,2
0.0,0,3
0.0,0,4
0.0,0,5
0.0,0,6
0.0,0,7

I already have a script that finds values that have been in a row five times
That is, he just collects such values but does not see the values that were two seconds later

Here it is

    s = np.sign(df['all_answers'])
    group = s.ne(s.shift()).cumsum()
    df['Score'] = s.groupby(group).cumcount().add(1)
    g = df['all_answers'].ne(df['all_answers'].shift()).cumsum()
    X1 = int(x)
    m = df.groupby(g)['all_answers'].transform('size').ge(5)
    df['New'] = df['Answers'].where(m, 0)

That’s what it outputs

Answers,all_answers,Score,New
1.0,1,1,0.0
0.0,0,1,0.0
0.0,0,2,0.0
0.0,0,3,0.0
0.0,0,4,0.0
0.0,0,5,0.0
-1.0,1,1,-1
1.0,1,2,1.0
-1.0,1,3,-1
1.0,1,4,1.0
-1.0,1,5,-1

The logic of creating a new column is as follows: that he will collect the values 1 and -1 from Answers between which there is no distance where the Score column has raised 5

Asked By: Serega

||

Source

Answer 1

IIUC, something as follows should work.

Data

import pandas as pd
import numpy as np

data = {'Answers': {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: -1, 8: 0, 
                    9: 0, 10: 1, 11: -1, 12: 0, 13: 1}, 
        'all_answers': {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 
                        8: 0, 9: 0, 10: 1, 11: 1, 12: 0, 13: 1}, 
        'Score': {0: 1, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 1, 8: 1, 
                  9: 2, 10: 6, 11: 2, 12: 1, 13: 1}}

df = pd.DataFrame(data)

# notice that I've given Answer `1` at index value `10` a Score of `6`. 
# This should *not* affect `New` for the next answer.
print(df.iloc[10:12])
    Answers  all_answers  Score
10        1            1      6
11       -1            1      2

Problem

Each answer (-1,1) in col Answers may be preceded by one of more rows where df.Answers == 0. If, in such an event the preceding rows with 0 contain an associated Score value that exceeds 5, then the following answer (so: either -1 or 1) should get a 0 in col New. In all other scenarios, col New should simply take the value from col Answers.

Solution

# add col `groups` with each consecutive value in `Answers` as one group
df['groups'] = (df.Answers != df.Answers.shift()).cumsum()

# add col `mask` with 1 for each `Answers` value `(-1,1)`, 
# and the actual df.Score for all `0` vals
df['mask'] = np.where(df.Answers.ne(0),1,df.Score)

# now groupby `groups` and get the max `mask` val for each group,
# check if this val exceeds 5 (`gt(5)`) to get a boolean Series
# we shift the series so that it can be mapped onto the *following* group
# (and we `fillna` for the very first group: 
    # it will always need a `False` (never preceded by `0s` of course))
# finally, we map this result to col `groups`, and use the boolean Series
# inside `np.where` for evaluation. If `False` we want `0`, else `df.Answers`
df['New'] = np.where(
    df['groups'].map(
        df.groupby('groups')['mask'].max().gt(5).shift().fillna(False)),
    0,
    df.Answers)

# drop the temp cols `groups` and `mask`
df.drop(['groups','mask'], axis=1, inplace=True)

print(df)

    Answers  all_answers  Score  New
0         1            1      1    1
1         0            0      1    0
2         0            0      2    0
3         0            0      3    0
4         0            0      4    0
5         0            0      5    0
6         0            0      6    0
7        -1            1      1    0
8         0            0      1    0
9         0            0      2    0
10        1            1      6    1
11       -1            1      2   -1
12        0            0      1    0
13        1            1      1    1

Check:

print(df.loc[df.Answers.ne(0)])

    Answers  all_answers  Score  New
0         1            1      1    1
7        -1            1      1    0
10        1            1      6    1
11       -1            1      2   -1
13        1            1      1    1

So, all answers should be OK in New, except -1 at index value 7, since this is preceded by consecutive 0 rows with a value exceeding 5, namely:

print(df.iloc[1:8])

   Answers  all_answers  Score  New
1        0            0      1    0
2        0            0      2    0
3        0            0      3    0
4        0            0      4    0
5        0            0      5    0
6        0            0      6    0
7       -1            1      1    0

Answered By: ouroboros1

Collect values by internal numbering while taking into account the seconds between them

Question:

Answers: