How do I add a column with the value being the result of a function including max( )?
Question:
If I have a data frame consisting of the following values (exact values don’t matter):
import pandas as pd
import bumpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(5, 4)), columns=list('ABCD'))
df
How do I add a fifth column ‘E’ and have the values in column E compare the value A to values B,C,D? I want to have the result be 1 if Column A is greater than the max value of B, C, D column values and 0 if Column A is less than the max value of B, C, D column values.
I tried the following:
df['E']= np.where( df['A'] > max(df['B'],df['C'],df['D'], 1, 0)
I receive the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thanks in advance!
Answers:
here is one way to do it, using pandas max
df['E']=np.where(df['A']> df[['B','C','D']].max(axis=1),
1,
0)
df
A B C D E
0 92 23 7 68 1
1 23 79 79 38 0
2 66 19 29 92 0
3 13 40 4 36 0
4 39 28 51 90 0
alternate way
compare and convert the boolean to int
df['E']=(df['A']> df[['B','C','D']].max(axis=1)).astype(int)
df
A B C D E
0 94 8 31 82 1
1 68 23 9 76 0
2 52 66 42 78 0
3 43 18 21 3 1
4 21 39 95 29 0
df
A B C D
0 94 8 31 82
1 68 23 9 76
2 52 66 42 78
3 43 18 21 3
4 21 39 95 29
If I have a data frame consisting of the following values (exact values don’t matter):
import pandas as pd
import bumpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(5, 4)), columns=list('ABCD'))
df
How do I add a fifth column ‘E’ and have the values in column E compare the value A to values B,C,D? I want to have the result be 1 if Column A is greater than the max value of B, C, D column values and 0 if Column A is less than the max value of B, C, D column values.
I tried the following:
df['E']= np.where( df['A'] > max(df['B'],df['C'],df['D'], 1, 0)
I receive the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thanks in advance!
here is one way to do it, using pandas max
df['E']=np.where(df['A']> df[['B','C','D']].max(axis=1),
1,
0)
df
A B C D E
0 92 23 7 68 1
1 23 79 79 38 0
2 66 19 29 92 0
3 13 40 4 36 0
4 39 28 51 90 0
alternate way
compare and convert the boolean to int
df['E']=(df['A']> df[['B','C','D']].max(axis=1)).astype(int)
df
A B C D E
0 94 8 31 82 1
1 68 23 9 76 0
2 52 66 42 78 0
3 43 18 21 3 1
4 21 39 95 29 0
df
A B C D
0 94 8 31 82
1 68 23 9 76
2 52 66 42 78
3 43 18 21 3
4 21 39 95 29