change the value of column to the maximum value above it in the same column
Question:
This is my dataframe:
df = pd.DataFrame({'a': [100, 103, 101, np.nan, 105, 107, 100]})
And this is the output that I want:
a b
0 100.0 100
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107
I want to create column b
which takes values of column a
and replace them with the maximum value that is on top of it.
For example when there is 103 in a
I want to change all values to 103 until a greater number is in column a
. That is why rows 2 and 3 are changed to 103 and since in row 4 there is a greater number than 103 I want to put that in column b
until a greater number is in column a
.
I have tried a couple of posts on stackoverflow. One of them was this answer. But still I couldn’t figure out how to do it.
Answers:
Use Series.cummax
with replace missing values by previous non NaN
s by ffill
:
df = pd.DataFrame({'a': [100, 103, 101, np.nan, 105, 107, 100]})
df['b'] = df['a'].ffill().cummax().astype(int)
#alternative
#df['b'] = df['a'].ffill(downcast='int').cummax()
print (df)
a b
0 100.0 100
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107
If possible in real data first value is NaN
:
df = pd.DataFrame({'a': [np.nan, 103, 101, np.nan, 105, 107, 100]})
df['b'] = df['a'].ffill().cummax().astype('Int64')
print (df)
a b
0 NaN <NA>
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107
This is my dataframe:
df = pd.DataFrame({'a': [100, 103, 101, np.nan, 105, 107, 100]})
And this is the output that I want:
a b
0 100.0 100
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107
I want to create column b
which takes values of column a
and replace them with the maximum value that is on top of it.
For example when there is 103 in a
I want to change all values to 103 until a greater number is in column a
. That is why rows 2 and 3 are changed to 103 and since in row 4 there is a greater number than 103 I want to put that in column b
until a greater number is in column a
.
I have tried a couple of posts on stackoverflow. One of them was this answer. But still I couldn’t figure out how to do it.
Use Series.cummax
with replace missing values by previous non NaN
s by ffill
:
df = pd.DataFrame({'a': [100, 103, 101, np.nan, 105, 107, 100]})
df['b'] = df['a'].ffill().cummax().astype(int)
#alternative
#df['b'] = df['a'].ffill(downcast='int').cummax()
print (df)
a b
0 100.0 100
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107
If possible in real data first value is NaN
:
df = pd.DataFrame({'a': [np.nan, 103, 101, np.nan, 105, 107, 100]})
df['b'] = df['a'].ffill().cummax().astype('Int64')
print (df)
a b
0 NaN <NA>
1 103.0 103
2 101.0 103
3 NaN 103
4 105.0 105
5 107.0 107
6 100.0 107