How to get condition in Pandas DataFrame
Question:
I try to apply an IF condition in Pandas DataFrame.
DataFrame which appears as following:
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
Desired result is the following:
| col1 | col2 | result
--------------------------------------------------
| 0 | 0 | other
| 1 | 1 | col1 + col2
| 1 | 0 | col1
| 0 | 1 | col2
By using np.where(), I am blocked how to get columns name suppose I have multiple columns
Answers:
In [127]: df.dot(df.columns + " + ").str[:-3].replace("", "other")
Out[127]:
0 other
1 Col1 + Col2
2 Col1
3 Col2
dtype: object
- dot product the values of the frame with the columns but " + " added as the joiner
- since values are 1/0, dot product is like a selector
- strip off the trailing " + "s with [:-3]
- if empty at this point, means came from 0-0; so replace with "other"
I would go for a custom function for this use case converting the dataframe to Boolean, then using the row values as masking for indices:
def func_(row):
cols = row.index[row]
if not cols.empty:
return " + ".join(cols)
else:
return "other"
df['result']=df.astype(bool).apply(func_, axis=1)
# df
Col1 Col2 result
0 0 0 other
1 1 1 Col1 + Col2
2 1 0 Col1
3 0 1 Col2
In case if you want one-liner solution, you can replace above function by a lambda function using named expression
if you are using Python 3.8+:
df['result'] = (df
.astype(bool)
.apply(lambda row: "other" if (cols := row.index[row]).empty else " + ".join(cols),
axis=1)
)
(
df.join(
df.melt(ignore_index=False, var_name='result')
.query('value==1')
.groupby(level=0)['result']
.apply(lambda x: ' + '.join(x)))
.fillna('other')
)
Here’s the np.where() method for multiple value conditions:
import pandas as pd
import numpy as np
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
df["result"]=np.where((df["Col1"]==0) & (df["Col2"]==0), "other",
np.where((df["Col1"]==1) & (df["Col2"]==1), "col1 + col2",
np.where(df["Col1"]==1, "col1", "col2")))
print(df)
Result:
Col1 Col2 result
0 0 0 other
1 1 1 col1 + col2
2 1 0 col1
3 0 1 col2
Using nested np.where(), however, can make your code quite ugly and convoluted…
I try to apply an IF condition in Pandas DataFrame.
DataFrame which appears as following:
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
Desired result is the following:
| col1 | col2 | result
--------------------------------------------------
| 0 | 0 | other
| 1 | 1 | col1 + col2
| 1 | 0 | col1
| 0 | 1 | col2
By using np.where(), I am blocked how to get columns name suppose I have multiple columns
In [127]: df.dot(df.columns + " + ").str[:-3].replace("", "other")
Out[127]:
0 other
1 Col1 + Col2
2 Col1
3 Col2
dtype: object
- dot product the values of the frame with the columns but " + " added as the joiner
- since values are 1/0, dot product is like a selector
- strip off the trailing " + "s with [:-3]
- if empty at this point, means came from 0-0; so replace with "other"
I would go for a custom function for this use case converting the dataframe to Boolean, then using the row values as masking for indices:
def func_(row):
cols = row.index[row]
if not cols.empty:
return " + ".join(cols)
else:
return "other"
df['result']=df.astype(bool).apply(func_, axis=1)
# df
Col1 Col2 result
0 0 0 other
1 1 1 Col1 + Col2
2 1 0 Col1
3 0 1 Col2
In case if you want one-liner solution, you can replace above function by a lambda function using named expression
if you are using Python 3.8+:
df['result'] = (df
.astype(bool)
.apply(lambda row: "other" if (cols := row.index[row]).empty else " + ".join(cols),
axis=1)
)
(
df.join(
df.melt(ignore_index=False, var_name='result')
.query('value==1')
.groupby(level=0)['result']
.apply(lambda x: ' + '.join(x)))
.fillna('other')
)
Here’s the np.where() method for multiple value conditions:
import pandas as pd
import numpy as np
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
df["result"]=np.where((df["Col1"]==0) & (df["Col2"]==0), "other",
np.where((df["Col1"]==1) & (df["Col2"]==1), "col1 + col2",
np.where(df["Col1"]==1, "col1", "col2")))
print(df)
Result:
Col1 Col2 result
0 0 0 other
1 1 1 col1 + col2
2 1 0 col1
3 0 1 col2
Using nested np.where(), however, can make your code quite ugly and convoluted…