Python – compare previous row value and fill upwards when max is reached
Question:
I have this dataset:
col1 = [1,2,3,4,5,6,7,8]
col2 = [2,3,5,1,4,3,4,5]
df = pd.DataFrame({'Column1': col1, 'Column2': col2})
Column1 Column2
1 2
2 3
3 5
4 1
5 4
6 3
7 4
8 5
I am trying to get it so when the Column2 has stopped increasing that it fills the previous values so the expected output would be:
Column1 Column2
1 5
2 5
3 5
4 4
5 4
6 5
7 5
8 5
I tried doing this by a for loop comparing the previous to the current, but this would require lots of for loops.
Is there an efficient way of doing this?
Answers:
groupby
increasing stretches and transform
with the last
value:
df['Column2'] = (df.groupby(df['Column2'].diff().lt(0).cumsum())['Column2']
.transform('last')
)
output:
Column1 Column2
0 1 5
1 2 5
2 3 5
3 4 4
4 5 4
5 6 5
6 7 5
7 8 5
intermediate to define the group:
df['Column2'].diff().lt(0).cumsum()
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 2
Name: Column2, dtype: int64
Another solution:
df.Column2 = df.Column2[(df.Column2.diff() <= 0).shift(-1).fillna(True)]
df.Column2 = df.Column2.bfill()
print(df)
Prints:
Column1 Column2
0 1 5.0
1 2 5.0
2 3 5.0
3 4 4.0
4 5 4.0
5 6 5.0
6 7 5.0
7 8 5.0
I have this dataset:
col1 = [1,2,3,4,5,6,7,8]
col2 = [2,3,5,1,4,3,4,5]
df = pd.DataFrame({'Column1': col1, 'Column2': col2})
Column1 Column2
1 2
2 3
3 5
4 1
5 4
6 3
7 4
8 5
I am trying to get it so when the Column2 has stopped increasing that it fills the previous values so the expected output would be:
Column1 Column2
1 5
2 5
3 5
4 4
5 4
6 5
7 5
8 5
I tried doing this by a for loop comparing the previous to the current, but this would require lots of for loops.
Is there an efficient way of doing this?
groupby
increasing stretches and transform
with the last
value:
df['Column2'] = (df.groupby(df['Column2'].diff().lt(0).cumsum())['Column2']
.transform('last')
)
output:
Column1 Column2
0 1 5
1 2 5
2 3 5
3 4 4
4 5 4
5 6 5
6 7 5
7 8 5
intermediate to define the group:
df['Column2'].diff().lt(0).cumsum()
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 2
Name: Column2, dtype: int64
Another solution:
df.Column2 = df.Column2[(df.Column2.diff() <= 0).shift(-1).fillna(True)]
df.Column2 = df.Column2.bfill()
print(df)
Prints:
Column1 Column2
0 1 5.0
1 2 5.0
2 3 5.0
3 4 4.0
4 5 4.0
5 6 5.0
6 7 5.0
7 8 5.0