Updating pandas column faster than for loop
Question:
I need to update a dataframe column with an additional comparison. I manage this with a for loop with a couple of conditions
import pandas as pd
df = pd.DataFrame({'Signal':[1,1,1,0,0,0,0,0,0,1],'F1':[5,5,5,5,5,5,5,5,5,5],'F2':[5,5,5,5,5,6,4,4,4,4]})
for i in range(1,len(df)):
if (df['Signal'].iloc[i-1] == 1) & (df['F1'].iloc[i]<=(df['F2']).iloc[i]):
df['Signal'].iloc[i] = 1
The for loop checks the previous state, then checks my condition and updates the "Signal" column. In a dataframe with a few thousand rows, this operations begins to take more than I’d like to. I’m looking to optimize the code, but not sure how.
So far I have this list comprehension that give me the values of the update, but not the position where I should update. Also unsure if it is a faster solution than my loop
[1 for i in range(1,len(df)) if (df['Signal'].iloc[i-1] == 1) & (df['F1'].iloc[i]<=(df['F2']).iloc[i]) ]
Answers:
Code
cond1 = df['F1'] <= df['F2']
grp = cond1.ne(cond1.shift()).cumsum()
s1 = df['Signal']
cond2 = (s1.eq(1) | s1.shift().eq(1))
s2 = cond2.where(cond2).groupby(grp).ffill().fillna(0).astype('int')
df.assign(Signal=s1.mask(cond1, s2))
output:
Signal F1 F2
0 1 5 5
1 1 5 5
2 1 5 5
3 1 5 5
4 1 5 5
5 1 5 6
6 0 5 4
7 0 5 4
8 0 5 4
9 1 5 4
I need to update a dataframe column with an additional comparison. I manage this with a for loop with a couple of conditions
import pandas as pd
df = pd.DataFrame({'Signal':[1,1,1,0,0,0,0,0,0,1],'F1':[5,5,5,5,5,5,5,5,5,5],'F2':[5,5,5,5,5,6,4,4,4,4]})
for i in range(1,len(df)):
if (df['Signal'].iloc[i-1] == 1) & (df['F1'].iloc[i]<=(df['F2']).iloc[i]):
df['Signal'].iloc[i] = 1
The for loop checks the previous state, then checks my condition and updates the "Signal" column. In a dataframe with a few thousand rows, this operations begins to take more than I’d like to. I’m looking to optimize the code, but not sure how.
So far I have this list comprehension that give me the values of the update, but not the position where I should update. Also unsure if it is a faster solution than my loop
[1 for i in range(1,len(df)) if (df['Signal'].iloc[i-1] == 1) & (df['F1'].iloc[i]<=(df['F2']).iloc[i]) ]
Code
cond1 = df['F1'] <= df['F2']
grp = cond1.ne(cond1.shift()).cumsum()
s1 = df['Signal']
cond2 = (s1.eq(1) | s1.shift().eq(1))
s2 = cond2.where(cond2).groupby(grp).ffill().fillna(0).astype('int')
df.assign(Signal=s1.mask(cond1, s2))
output:
Signal F1 F2
0 1 5 5
1 1 5 5
2 1 5 5
3 1 5 5
4 1 5 5
5 1 5 6
6 0 5 4
7 0 5 4
8 0 5 4
9 1 5 4