Is there a way to cumsum value based on the flag and add the value to other flag consecutive rows without iteration in pandas
Question:
I’m trying to cumsum ‘value’ for ‘flag’==2, and add those cummulative values to consecutive rows of ‘flag’==1 and dropped flag 2 rows.
Input
Index_Date flag value
========================
2020-01-31 1 10
2020-02-01 2 5
2020-02-02 1 10
2020-02-03 2 5
2020-03-04 1 20
Desired output
Index_Date flag value
========================
2020-01-31 1 10
2020-02-02 1 15
2020-03-04 1 30
achieved the above result by iteration.
import pandas as pd
df = df.sort_values(by=['Index_Date'])
adjustments = 0
for index, row in df.iterrows():
if row['flag'] == 2:
adjustments += row['value']
else:
row['value'] += adjustments
df.drop(df[df['flag'] == 2].index, inplace = True)
Is there a way to achieve this without iteration?
Thanks.
Answers:
Try:
df['value'] += np.where(df['flag'] == 2, df['value'], 0).cumsum()
df = df[df['flag']==1]
Output:
Index_Date flag value
0 2020-01-31 1 10
2 2020-02-02 1 15
4 2020-03-04 1 30
def function1(dd:pd.DataFrame):
return dd.tail(1).assign(value=dd.value.sum())
col1=df1.flag.shift().eq(1).cumsum()
df1.groupby(col1,group_keys=False).apply(function1)
Output:
Index_Date flag value
0 2020-01-31 1 10
2 2020-02-02 1 15
4 2020-03-04 1 30
I’m trying to cumsum ‘value’ for ‘flag’==2, and add those cummulative values to consecutive rows of ‘flag’==1 and dropped flag 2 rows.
Input
Index_Date flag value
========================
2020-01-31 1 10
2020-02-01 2 5
2020-02-02 1 10
2020-02-03 2 5
2020-03-04 1 20
Desired output
Index_Date flag value
========================
2020-01-31 1 10
2020-02-02 1 15
2020-03-04 1 30
achieved the above result by iteration.
import pandas as pd
df = df.sort_values(by=['Index_Date'])
adjustments = 0
for index, row in df.iterrows():
if row['flag'] == 2:
adjustments += row['value']
else:
row['value'] += adjustments
df.drop(df[df['flag'] == 2].index, inplace = True)
Is there a way to achieve this without iteration?
Thanks.
Try:
df['value'] += np.where(df['flag'] == 2, df['value'], 0).cumsum()
df = df[df['flag']==1]
Output:
Index_Date flag value
0 2020-01-31 1 10
2 2020-02-02 1 15
4 2020-03-04 1 30
def function1(dd:pd.DataFrame):
return dd.tail(1).assign(value=dd.value.sum())
col1=df1.flag.shift().eq(1).cumsum()
df1.groupby(col1,group_keys=False).apply(function1)
Output:
Index_Date flag value
0 2020-01-31 1 10
2 2020-02-02 1 15
4 2020-03-04 1 30