How to get condition in Pandas DataFrame

Question:

I try to apply an IF condition in Pandas DataFrame.

DataFrame which appears as following:

a =  {'Col1': [0,1,1,0],
      'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)

Desired result is the following:

|   col1 |  col2 |    result
--------------------------------------------------
|   0    |  0    |   other
|   1    |  1    |   col1 + col2 
|   1    |  0    |   col1
|   0    |  1    |   col2

By using np.where(), I am blocked how to get columns name suppose I have multiple columns

Asked By: wysouf

||

Answers:

In [127]: df.dot(df.columns + " + ").str[:-3].replace("", "other")
Out[127]:
0          other
1    Col1 + Col2
2           Col1
3           Col2
dtype: object
  • dot product the values of the frame with the columns but " + " added as the joiner
    • since values are 1/0, dot product is like a selector
  • strip off the trailing " + "s with [:-3]
  • if empty at this point, means came from 0-0; so replace with "other"
Answered By: Mustafa Aydın

I would go for a custom function for this use case converting the dataframe to Boolean, then using the row values as masking for indices:

def func_(row):
    cols = row.index[row]
    if not cols.empty:
        return " + ".join(cols)
    else:
        return "other"
    

df['result']=df.astype(bool).apply(func_, axis=1)

# df
   Col1  Col2     result
0     0     0      other
1     1     1  Col1 + Col2
2     1     0       Col1
3     0     1       Col2

In case if you want one-liner solution, you can replace above function by a lambda function using named expression if you are using Python 3.8+:

df['result'] = (df
                .astype(bool)
                .apply(lambda row: "other" if (cols := row.index[row]).empty else " + ".join(cols),
                       axis=1)
                )
Answered By: ThePyGuy
(
    df.join(
        df.melt(ignore_index=False, var_name='result')
          .query('value==1')
          .groupby(level=0)['result']
          .apply(lambda x: ' + '.join(x)))
    .fillna('other')
)
Answered By: Chris

Here’s the np.where() method for multiple value conditions:

import pandas as pd
import numpy as np

a =  {'Col1': [0,1,1,0],
      'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)

df["result"]=np.where((df["Col1"]==0) & (df["Col2"]==0), "other",
                np.where((df["Col1"]==1) & (df["Col2"]==1), "col1 + col2", 
                    np.where(df["Col1"]==1, "col1", "col2")))

print(df)

Result:

   Col1  Col2       result
0     0     0        other
1     1     1  col1 + col2
2     1     0         col1
3     0     1         col2

Using nested np.where(), however, can make your code quite ugly and convoluted…

Answered By: Liutprand
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.