Is there a way to do last_valid_index() in a rolling window?
Question:
last_valid_index()
only applies to the entire dataframe and rolling()
does not allow last_valid_index()
. Is there a way to find the last valid index in a column of booleans in a window?
For instance:
d = {'col': [True, False, True, True, False, False]}
df = pd.DataFrame(data=d)
The expected outcome for a rolling window of 3 is:
0 NaN
1 NaN
2 2.0
3 3.0
4 3.0
5 3.0
Answers:
We have some work around
df['new'] = df.index
df['new'].mask(df.youcol.isnull()).ffill().rolling(3).max()
From the comment
df['new'] = df.index
df['new'] = df['new'].where(df.col).ffill().rolling(3).max()
0 NaN
1 NaN
2 2.0
3 3.0
4 3.0
5 3.0
Name: new, dtype: float64
As I mentioned in a comment here, I think the current accepted solution has a bug. A lot of the beginning of this post is taken word-for-word from my comment there.
Change the example to be
d = {'col': [True, False, True, True, False, False, False]}
df = pd.DataFrame(data=d)
Then the last 3 entries compose the entire rolling window of 3, and all are False. But the current accepted solution returns index 3 for the last entry, even though I am assuming it should be NaN (otherwise what’s the point of having the rolling window at all, other than to set the first 2 observations as NaN?).
Here is my proposed fix:
df['new'] = df.index
df['new'] = df['new'].where(df['col'], -1).rolling(3).max().replace(-1, np.nan)
What it does is instead of replacing values where df['col']
is False with NaNs, then using ffill()
to replace those indices with the previous index, it replaces those indices with -1. Then at the end, if all the indices in a window have value -1, it means the entire window has df['col']
as False, so that index is replaced with np.nan
.
last_valid_index()
only applies to the entire dataframe and rolling()
does not allow last_valid_index()
. Is there a way to find the last valid index in a column of booleans in a window?
For instance:
d = {'col': [True, False, True, True, False, False]}
df = pd.DataFrame(data=d)
The expected outcome for a rolling window of 3 is:
0 NaN
1 NaN
2 2.0
3 3.0
4 3.0
5 3.0
We have some work around
df['new'] = df.index
df['new'].mask(df.youcol.isnull()).ffill().rolling(3).max()
From the comment
df['new'] = df.index
df['new'] = df['new'].where(df.col).ffill().rolling(3).max()
0 NaN
1 NaN
2 2.0
3 3.0
4 3.0
5 3.0
Name: new, dtype: float64
As I mentioned in a comment here, I think the current accepted solution has a bug. A lot of the beginning of this post is taken word-for-word from my comment there.
Change the example to be
d = {'col': [True, False, True, True, False, False, False]}
df = pd.DataFrame(data=d)
Then the last 3 entries compose the entire rolling window of 3, and all are False. But the current accepted solution returns index 3 for the last entry, even though I am assuming it should be NaN (otherwise what’s the point of having the rolling window at all, other than to set the first 2 observations as NaN?).
Here is my proposed fix:
df['new'] = df.index
df['new'] = df['new'].where(df['col'], -1).rolling(3).max().replace(-1, np.nan)
What it does is instead of replacing values where df['col']
is False with NaNs, then using ffill()
to replace those indices with the previous index, it replaces those indices with -1. Then at the end, if all the indices in a window have value -1, it means the entire window has df['col']
as False, so that index is replaced with np.nan
.