Consecutive Green Days
Question:
I am trying to find the number of consecutive green closing prices from a dataframe before a certain date.
Input:
Ticker Date Close
0 AAPL 20200501 1.5
1 AAPL 20200502 1.2
2 AAPL 20200503 1.3
3 AAPL 20200504 1.3
4 AAPL 20200505 1.4
5 AAPL 20200506 1.5
In this example I would want to know the consecutive closing prices that were higher than the previous day’s closing prices on 20200507
Desired output:
2
Here is the code for the example dataframe
import pandas as pd
df1 = pd.DataFrame({'Ticker': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'],
'Date': [20200501, 20200502, 20200503, 20200504, 20200505, 20200506],
'Close': [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})
print(df1)
Answers:
@Quang Hoang Answer will work if you want to get the length of the longest streak.
If you instead want the length of the current streak of consecutive days the following will work:
df = pd.DataFrame({'Close' : [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})
streaks = (df['Close'].diff() <= 0).cumsum()
res = sum(streaks == streaks.iloc[-1]) - 1
Edit:
If you have rows with dates after 20200507
you can adapt like this:
df_subview = df[df['Date'] < 20200507]
Edit2:
Why df['Close'].diff() <= 0
instead of df['Close'].diff() > 0
?
This is what we want to arvieve:
>>> streaks
0 0
1 1
2 1
3 2
4 2
5 2
There are 3 Streaks here at play, the first ending after the 1. day, the second ending after the third day (index 2-3) and the last streak going all the way to the end.
To archieve this we need some number that goes up every time a streak is lost. We can use .cumsum
on an array of 0s
and 1s
to get this behaviour. Everytime a streak is lost the date should be marked as a 1
.
How do we test if a streak is broken? By doing the opposite of df['Close'].diff() > 0
, i.e. df['Close'].diff() <= 0
. That produces the following result:
>>> df['Close'].diff() <= 0
0 False
1 True
2 False
3 True
4 False
5 False
Name: Close, dtype: bool
Because internally True
is a 1
and False
a 0
we already have the required Array of Zeros and Ones.
>>> (df['Close'].diff() <= 0).astype(int)
0 0
1 1
2 0
3 1
4 0
5 0
Name: Close, dtype: int64
Now we can apply .cumsum
and get the length of the last streak. Because True
is already treated as a 1
we can omit the step .astype(int)
and directly call (df['Close'].diff() <= 0).cumsum()
.
df1['Close'].diff().gt(0)[::-1].cumprod().sum()
out:
2
I am trying to find the number of consecutive green closing prices from a dataframe before a certain date.
Input:
Ticker Date Close
0 AAPL 20200501 1.5
1 AAPL 20200502 1.2
2 AAPL 20200503 1.3
3 AAPL 20200504 1.3
4 AAPL 20200505 1.4
5 AAPL 20200506 1.5
In this example I would want to know the consecutive closing prices that were higher than the previous day’s closing prices on 20200507
Desired output:
2
Here is the code for the example dataframe
import pandas as pd
df1 = pd.DataFrame({'Ticker': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'],
'Date': [20200501, 20200502, 20200503, 20200504, 20200505, 20200506],
'Close': [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})
print(df1)
@Quang Hoang Answer will work if you want to get the length of the longest streak.
If you instead want the length of the current streak of consecutive days the following will work:
df = pd.DataFrame({'Close' : [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})
streaks = (df['Close'].diff() <= 0).cumsum()
res = sum(streaks == streaks.iloc[-1]) - 1
Edit:
If you have rows with dates after 20200507
you can adapt like this:
df_subview = df[df['Date'] < 20200507]
Edit2:
Why df['Close'].diff() <= 0
instead of df['Close'].diff() > 0
?
This is what we want to arvieve:
>>> streaks
0 0
1 1
2 1
3 2
4 2
5 2
There are 3 Streaks here at play, the first ending after the 1. day, the second ending after the third day (index 2-3) and the last streak going all the way to the end.
To archieve this we need some number that goes up every time a streak is lost. We can use .cumsum
on an array of 0s
and 1s
to get this behaviour. Everytime a streak is lost the date should be marked as a 1
.
How do we test if a streak is broken? By doing the opposite of df['Close'].diff() > 0
, i.e. df['Close'].diff() <= 0
. That produces the following result:
>>> df['Close'].diff() <= 0
0 False
1 True
2 False
3 True
4 False
5 False
Name: Close, dtype: bool
Because internally True
is a 1
and False
a 0
we already have the required Array of Zeros and Ones.
>>> (df['Close'].diff() <= 0).astype(int)
0 0
1 1
2 0
3 1
4 0
5 0
Name: Close, dtype: int64
Now we can apply .cumsum
and get the length of the last streak. Because True
is already treated as a 1
we can omit the step .astype(int)
and directly call (df['Close'].diff() <= 0).cumsum()
.
df1['Close'].diff().gt(0)[::-1].cumprod().sum()
out:
2