How to implement third Nelson's rule with Pandas?
Question:
I am trying to implement Nelson’s rules using Pandas. One of them is giving me grief, specifically number 3:
Using some example data:
data = pd.DataFrame({"values":[1,2,3,4,5,6,7,5,6,5,3]})
values
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
5
8
6
9
5
10
3
My first approach was to use a rolling window to check if they are in/decreasing with diff()>0
and use this to identify "hits" on the rule:
(data.diff()>0).rolling(6).sum()==6
This correctly identifies the end values (1=True, 0=False):
values
correct /desired
0
0
0
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
1
1
7
0
0
8
0
0
9
0
0
10
0
0
This misses the first points (which are part of the run) because rolling is a look behind. Given this rule requires 6 points in a row, I essentially need to evaluate for a given point the 6 possible windows it can fall in and then mark it as true if it is part of any window in which the points are consecutively in/decreasing.
I can think of how I could do this with some custom Python code with iterrows()
or apply
. I am, however keen to keep this performant, so want to limit myself to the Panda’s API.
How can this be achieved ?
Answers:
With the following toy dataframe (an extended version of yours):
import pandas as pd
df = pd.DataFrame({"values": [1, 2, 3, 4, 5, 6, 7, 5, 6, 5, 3, 11, 12, 13, 14, 15, 16, 4, 3, 8, 9, 10, 2]})
Here is one way to do it with Pandas rolling and interpolate:
# Find consecutive values
df["check"] = (df.diff() > 0).rolling(6).sum()
df["check"] = df["check"].mask(df["check"] < 6).mask(df["check"] >= 6, 1)
# Mark values
df = df.interpolate(limit_direction="backward", limit=5).fillna(0)
Then:
print(df)
# Output
values check
0 1 0
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
6 7 1
7 5 0
8 6 0
9 5 0
10 3 0
11 11 1
12 12 1
13 13 1
14 14 1
15 15 1
16 16 1
17 4 0
18 3 0
19 8 0
20 9 0
21 10 0
22 2 0
I am trying to implement Nelson’s rules using Pandas. One of them is giving me grief, specifically number 3:
Using some example data:
data = pd.DataFrame({"values":[1,2,3,4,5,6,7,5,6,5,3]})
values | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
5 | 6 |
6 | 7 |
7 | 5 |
8 | 6 |
9 | 5 |
10 | 3 |
My first approach was to use a rolling window to check if they are in/decreasing with diff()>0
and use this to identify "hits" on the rule:
(data.diff()>0).rolling(6).sum()==6
This correctly identifies the end values (1=True, 0=False):
values | correct /desired | |
---|---|---|
0 | 0 | 0 |
1 | 0 | 1 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 0 | 1 |
5 | 0 | 1 |
6 | 1 | 1 |
7 | 0 | 0 |
8 | 0 | 0 |
9 | 0 | 0 |
10 | 0 | 0 |
This misses the first points (which are part of the run) because rolling is a look behind. Given this rule requires 6 points in a row, I essentially need to evaluate for a given point the 6 possible windows it can fall in and then mark it as true if it is part of any window in which the points are consecutively in/decreasing.
I can think of how I could do this with some custom Python code with iterrows()
or apply
. I am, however keen to keep this performant, so want to limit myself to the Panda’s API.
How can this be achieved ?
With the following toy dataframe (an extended version of yours):
import pandas as pd
df = pd.DataFrame({"values": [1, 2, 3, 4, 5, 6, 7, 5, 6, 5, 3, 11, 12, 13, 14, 15, 16, 4, 3, 8, 9, 10, 2]})
Here is one way to do it with Pandas rolling and interpolate:
# Find consecutive values
df["check"] = (df.diff() > 0).rolling(6).sum()
df["check"] = df["check"].mask(df["check"] < 6).mask(df["check"] >= 6, 1)
# Mark values
df = df.interpolate(limit_direction="backward", limit=5).fillna(0)
Then:
print(df)
# Output
values check
0 1 0
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
6 7 1
7 5 0
8 6 0
9 5 0
10 3 0
11 11 1
12 12 1
13 13 1
14 14 1
15 15 1
16 16 1
17 4 0
18 3 0
19 8 0
20 9 0
21 10 0
22 2 0