How to remove duplicated buy signal
Question:
I’m testing my stock trading logic and I made a position column to check the buying / selling signal
df = pd.DataFrame({'position': [1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]})
I want to replace 1.0 value occurs between 1.0 and -1.0 with 0.0, and replace -1.0 value occurs between -1.0 and 1.0 with 0.0
here is the desired output:
df = pd.DataFrame({'position': [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]})
NOTE: the output only keeps the initial signal of 1.0 or -1.0
Answers:
can you show us what you tried to do and didn’t work, so we can help?
anyway, maybe start with a simple solution:
- loop over the array
- keep track of what you saw most recently: -1 or 1
- change every entry that matches the most recent
- deal with edge cases (eg. loop only from first non 0 to last non 0)
Here is a basic implementation based on the approach described by the previous answer:
lastseen = 0
for n,el in enumerate(df["position"]):
if lastseen == 0 and el == -1:
raise Exception("Inconsistent data")
if (el in [1, -1] and el != lastseen) or lastseen == 0:
lastseen = el
else:
df["position"][n] = 0
I added the first check by considering the domain you described. If it’s not correct for your problem feel free to remove it
Vectorized solution that uses the capabilities of Pandas in full:
s = pd.Series([1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0])
s_ = s.mask(s == 0).ffill()
result = s.where(s_ != s_.shift(), 0)
print(pd.DataFrame({'input': s, 'result': result}))
Output:
input result
0 1.0 1.0
1 0.0 0.0
2 0.0 0.0
3 1.0 0.0
4 0.0 0.0
5 0.0 0.0
6 0.0 0.0
7 0.0 0.0
8 0.0 0.0
9 0.0 0.0
10 -1.0 -1.0
11 1.0 1.0
12 0.0 0.0
13 -1.0 -1.0
14 -1.0 0.0
15 0.0 0.0
16 0.0 0.0
17 0.0 0.0
18 0.0 0.0
19 0.0 0.0
20 1.0 1.0
I’m testing my stock trading logic and I made a position column to check the buying / selling signal
df = pd.DataFrame({'position': [1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]})
I want to replace 1.0 value occurs between 1.0 and -1.0 with 0.0, and replace -1.0 value occurs between -1.0 and 1.0 with 0.0
here is the desired output:
df = pd.DataFrame({'position': [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]})
NOTE: the output only keeps the initial signal of 1.0 or -1.0
can you show us what you tried to do and didn’t work, so we can help?
anyway, maybe start with a simple solution:
- loop over the array
- keep track of what you saw most recently: -1 or 1
- change every entry that matches the most recent
- deal with edge cases (eg. loop only from first non 0 to last non 0)
Here is a basic implementation based on the approach described by the previous answer:
lastseen = 0
for n,el in enumerate(df["position"]):
if lastseen == 0 and el == -1:
raise Exception("Inconsistent data")
if (el in [1, -1] and el != lastseen) or lastseen == 0:
lastseen = el
else:
df["position"][n] = 0
I added the first check by considering the domain you described. If it’s not correct for your problem feel free to remove it
Vectorized solution that uses the capabilities of Pandas in full:
s = pd.Series([1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 1.0, 0.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0])
s_ = s.mask(s == 0).ffill()
result = s.where(s_ != s_.shift(), 0)
print(pd.DataFrame({'input': s, 'result': result}))
Output:
input result
0 1.0 1.0
1 0.0 0.0
2 0.0 0.0
3 1.0 0.0
4 0.0 0.0
5 0.0 0.0
6 0.0 0.0
7 0.0 0.0
8 0.0 0.0
9 0.0 0.0
10 -1.0 -1.0
11 1.0 1.0
12 0.0 0.0
13 -1.0 -1.0
14 -1.0 0.0
15 0.0 0.0
16 0.0 0.0
17 0.0 0.0
18 0.0 0.0
19 0.0 0.0
20 1.0 1.0