Comparing the value of a column with the previous value of a new column using Apply in Python (Pandas)
Question:
I have a dataframe with these values in column A:
df = pd.DataFrame(A,columns =['A'])
A
0 0
1 5
2 1
3 7
4 0
5 2
6 1
7 3
8 0
I need to create a new column (called B) and populate it using next conditions:
Condition 1: If the value of A is equal to 0 then, the value of B must be 0.
Condition 2: If the value of A is not 0 then I compare its value to the previous value of B. If A is higher than the previous value of B then I take A, otherwise I take B.
The result should be this:
A B
0 0 0
1 5 5
2 1 5
3 7 7
4 0 0
5 2 2
6 1 2
7 3 3
The dataset is huge and using loops would be too slow. I would need to solve this without using loops and the pandas “Loc” function. Anyone could help me to solve this using the Apply function? I have tried different things without success.
Thanks a lot.
Answers:
Use .shift()
to shift your one cell down and check if the previous value is smaller and it is not 0. Then use .mask()
to replace the values with the previous if the condition stands.
from io import StringIO
import pandas as pd
wt = StringIO("""A
0 0
1 2
2 3
3 1
4 2
5 7
6 0
""")
df = pd.read_csv(wt, sep='ss+')
df
A
0 0
1 2
2 3
3 1
4 2
5 7
6 0
def func(df, col):
df['B'] = df[col].mask(cond=((df[col].shift(1) > df[col]) & (df[col] != 0)), other=df[col].shift(1))
if col == 'B':
while ((df[col].shift(1) > df[col]) & (df[col] != 0)).any():
df['B'] = df[col].mask(cond=((df[col].shift(1) > df[col]) & (df[col] != 0)), other=df[col].shift(1))
return df
(df.pipe(func, 'A').pipe(func, 'B'))
Output:
A B
0 0 0
1 2 2
2 3 3
3 1 3
4 2 3
5 7 7
6 0 0
Try this:
df['B'] = df['A'].shift()
df['B'] = df.apply(lambda x:0 if x.A == 0 else x.A if x.A > x.B else x.B, axis=1)
One way to do this I guess could be the following
def do_your_stuff(row):
global value
# fancy stuff here
value = row["b"]
[...]
value = df.iloc[0]['B']
df["C"] = df.apply(lambda row: do_your_stuff(row), axis=1)
Using the solution of Achille I solved it this way:
import pandas as pd
A = [0,2,3,0,2,7,2,3,2,20,1,0,2,5,4,3,1]
df = pd.DataFrame(A,columns =['A'])
df['B'] = 0
def function(row):
global value
global prev
if row['A'] ==0:
value = 0
elif row['A'] > value:
value = row['A']
else:
value = prev
prev = value
return value
value = df.iloc[0]['B']
prev = value
df["B"] = df.apply(lambda row: function(row), axis=1)
df
I have a dataframe with these values in column A:
df = pd.DataFrame(A,columns =['A'])
A
0 0
1 5
2 1
3 7
4 0
5 2
6 1
7 3
8 0
I need to create a new column (called B) and populate it using next conditions:
Condition 1: If the value of A is equal to 0 then, the value of B must be 0.
Condition 2: If the value of A is not 0 then I compare its value to the previous value of B. If A is higher than the previous value of B then I take A, otherwise I take B.
The result should be this:
A B
0 0 0
1 5 5
2 1 5
3 7 7
4 0 0
5 2 2
6 1 2
7 3 3
The dataset is huge and using loops would be too slow. I would need to solve this without using loops and the pandas “Loc” function. Anyone could help me to solve this using the Apply function? I have tried different things without success.
Thanks a lot.
Use .shift()
to shift your one cell down and check if the previous value is smaller and it is not 0. Then use .mask()
to replace the values with the previous if the condition stands.
from io import StringIO
import pandas as pd
wt = StringIO("""A
0 0
1 2
2 3
3 1
4 2
5 7
6 0
""")
df = pd.read_csv(wt, sep='ss+')
df
A
0 0
1 2
2 3
3 1
4 2
5 7
6 0
def func(df, col):
df['B'] = df[col].mask(cond=((df[col].shift(1) > df[col]) & (df[col] != 0)), other=df[col].shift(1))
if col == 'B':
while ((df[col].shift(1) > df[col]) & (df[col] != 0)).any():
df['B'] = df[col].mask(cond=((df[col].shift(1) > df[col]) & (df[col] != 0)), other=df[col].shift(1))
return df
(df.pipe(func, 'A').pipe(func, 'B'))
Output:
A B
0 0 0
1 2 2
2 3 3
3 1 3
4 2 3
5 7 7
6 0 0
Try this:
df['B'] = df['A'].shift()
df['B'] = df.apply(lambda x:0 if x.A == 0 else x.A if x.A > x.B else x.B, axis=1)
One way to do this I guess could be the following
def do_your_stuff(row):
global value
# fancy stuff here
value = row["b"]
[...]
value = df.iloc[0]['B']
df["C"] = df.apply(lambda row: do_your_stuff(row), axis=1)
Using the solution of Achille I solved it this way:
import pandas as pd
A = [0,2,3,0,2,7,2,3,2,20,1,0,2,5,4,3,1]
df = pd.DataFrame(A,columns =['A'])
df['B'] = 0
def function(row):
global value
global prev
if row['A'] ==0:
value = 0
elif row['A'] > value:
value = row['A']
else:
value = prev
prev = value
return value
value = df.iloc[0]['B']
prev = value
df["B"] = df.apply(lambda row: function(row), axis=1)
df