Substracting row values based on conditions in python pandas

Question:

I have the following similar dataframe. I want to create a new variable time since eligibility that calculates the day diffence since the asset becomes eligible (the dummy eligible turns from 0 to 1). I added the variable TSE by hand that I would like to create.

import pandas as pd

df1 = pd.DataFrame(columns=['asset', 'eligible', "date"])
df1['asset'] = ['003', '003', '003', '003', '003', '003', '002', '002', '002', '002', '002', '002', '002', '003', '003', '003']
df1['eligible'] = [0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
df1['date'] = ['1.1.2016', '2.1.2016', '3.1.2016', '4.1.2016', '5.1.2016', '6.1.2016', '1.1.2016', '2.1.2016', '3.1.2016', 
               '4.1.2016', '5.1.2016', '6.1.2016', '7.1.2016', '1.1.2016', '2.1.2016', '3.1.2016']
df1["TSE"] = [0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 2, 3, 1, 2,3 ]
   asset  eligible      date  TSE
0    003         0  1.1.2016    0
1    003         0  2.1.2016    0
2    003         1  3.1.2016    1
3    003         1  4.1.2016    2
4    003         1  5.1.2016    3
5    003         1  6.1.2016    4
6    002         0  1.1.2016    0
7    002         0  2.1.2016    0
8    002         0  3.1.2016    0
9    002         0  4.1.2016    0
10   002         1  5.1.2016    1
11   002         1  6.1.2016    2
12   002         1  7.1.2016    3
13   003         1  1.1.2016    1
14   003         1  2.1.2016    2
15   003         1  3.1.2016    3

How would you tackle the problem? I tried the following loop but it does not work.

for i in range(len(df1)-1):
    for j in range(len(df1)-1):
        row1, row2 = df1.iloc[i], df1.iloc[i+j]
        if row1["eligible"] - row2["eligible"] != 0:
            df1.iloc[i] = df1['date'].diff()
Asked By: Lisa

||

Answers:

Use cumsum with groupby.transform to get the desired result:

grp = (df1.eligible.diff()<0).cumsum()
df1["TSE"] = df1.groupby([grp, 'asset'], as_index=False)['eligible'].transform('cumsum')
Answered By: Space Impact
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.