How to add a column with indices of the last occurrence of a value in pandas

Question:

I am trying to add a new column to a dataframe that contains the index of the last occurrence of a value in a certain column.

Say we have the dataframe df

   0 
0  1
1  5
2  4
3  1
4  7
5  9

I want to add a 2nd column that contains the index of the last occurrence of 1 in column 0.

So the desired output becomes:

   0  1
0  1  0
1  5  0
2  4  0
3  1  3
4  7  3
5  9  3

I have a working solution with a loop:

for i in df.index:
   sub_df = df[0: i + 1]
   recent_1_index = sub_df[(sub_df[0] == 1)].index[-1]
   df.at[i, 1] = recent_1_index

However, this solution is slow for a large dataframe as I believe it runs in O(n*m) with n being the length of the df and m the length of the sub_df? Does anyone have a solution that does not use the loop and is significantly faster?

Thanks!

Asked By: whaddaplaya

||

Answers:

You can find index it match at least one value per row by 1 and convert another values to NaN, last forward filling missing values (if not matched first rows are generated NaNs):

df[4] = df.index.to_series().where(df[0].eq(1)).ffill()
print (df)
   0  1  2  4
0  1  2  1  0
1  4  7  7  0
2  4  3  9  0
3  1  9  4  3
4  4  2  6  3
5  4  5  9  3

Explanation:

print (df[0].eq(1))
0     True
1    False
2    False
3     True
4    False
5    False
dtype: bool

print (df.index.to_series().where(df[0].eq(1)))
0    0.0
1    NaN
2    NaN
3    3.0
4    NaN
5    NaN
dtype: float64

print (df.index.to_series().where(df[0].eq(1)).ffill())
0    0.0
1    0.0
2    0.0
3    3.0
4    3.0
5    3.0
dtype: float64
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.