Consecutive Green Days

Question

I am trying to find the number of consecutive green closing prices from a dataframe before a certain date.

Input:

  Ticker      Date  Close
0   AAPL  20200501    1.5
1   AAPL  20200502    1.2
2   AAPL  20200503    1.3
3   AAPL  20200504    1.3
4   AAPL  20200505    1.4
5   AAPL  20200506    1.5

In this example I would want to know the consecutive closing prices that were higher than the previous day’s closing prices on 20200507

Desired output:

Here is the code for the example dataframe

import pandas as pd

df1 = pd.DataFrame({'Ticker': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'],
                'Date': [20200501, 20200502, 20200503, 20200504, 20200505, 20200506],
               'Close': [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})
print(df1)

Asked By: Jackey12345

||

Source

Answer 1

@Quang Hoang Answer will work if you want to get the length of the longest streak.
If you instead want the length of the current streak of consecutive days the following will work:

df = pd.DataFrame({'Close' : [1.5, 1.2, 1.3, 1.3, 1.4, 1.5]})

streaks = (df['Close'].diff() <= 0).cumsum()
res = sum(streaks == streaks.iloc[-1]) - 1

Edit:
If you have rows with dates after 20200507 you can adapt like this:

df_subview = df[df['Date'] < 20200507]

Edit2:
Why df['Close'].diff() <= 0 instead of df['Close'].diff() > 0?

This is what we want to arvieve:

>>> streaks
0    0
1    1
2    1
3    2
4    2
5    2

There are 3 Streaks here at play, the first ending after the 1. day, the second ending after the third day (index 2-3) and the last streak going all the way to the end.

To archieve this we need some number that goes up every time a streak is lost. We can use .cumsum on an array of 0s and 1s to get this behaviour. Everytime a streak is lost the date should be marked as a 1.

How do we test if a streak is broken? By doing the opposite of df['Close'].diff() > 0, i.e. df['Close'].diff() <= 0. That produces the following result:

>>> df['Close'].diff() <= 0
0    False
1     True
2    False
3     True
4    False
5    False
Name: Close, dtype: bool

Because internally True is a 1 and False a 0 we already have the required Array of Zeros and Ones.

>>> (df['Close'].diff() <= 0).astype(int)
0    0
1    1
2    0
3    1
4    0
5    0
Name: Close, dtype: int64

Now we can apply .cumsum and get the length of the last streak. Because True is already treated as a 1 we can omit the step .astype(int) and directly call (df['Close'].diff() <= 0).cumsum().

Answered By: Dames

Answer 2

df1['Close'].diff().gt(0)[::-1].cumprod().sum()

out：

Answered By: G.G

Consecutive Green Days

Question:

Answers: