How to find rows of a DataFrame that matches a given set of conditions for all columns?

Question:

Given the following Pandas DataFrame, how can I get all the rows where only Test1 and Test2 failed? In my true dataset I have about 70 different tests and need an easy way to filter devices based on different tests they failed.

import pandas as pd

data = [['SN-01', 'Fail', 'Pass', 'Pass'], 
        ['SN-02', 'Pass', 'Fail', 'Pass'], 
        ['SN-03', 'Fail', 'Fail', 'Pass'],
        ['SN-04', 'Fail', 'Fail', 'Fail']]

df = pd.DataFrame(data, columns = ['Serial', 'Test1', 'Test2', 'Test3'])

Out[133]: 
  Serial Test1 Test2 Test3
0  SN-01  Fail  Pass  Pass
1  SN-02  Pass  Fail  Pass
2  SN-03  Fail  Fail  Pass
3  SN-04  Fail  Fail  Fail

If I try the following

df.loc[(df.Test1 == "Fail") & (df.Test2 == "Fail"), 'Serial']
Out[132]: 
2    SN-03
3    SN-04
Name: Serial, dtype: object

I get SN-03 and SN-04 as output but the correct answer is only SN-03. Is there a way to create a boolean mask and apply to each row and filter them if they are a true match? What is the proper Pythonic/Panda way to get the results?

Update
I found a solution, but it feel non Pythonic. Is there no built-in solution to filter rows by comparing each row with a mask?

#Replace pass/fail string with boolean, note fail is True here
df = df.replace({"Pass":False, "Fail":True})
#get all rows which have exactly 2 failed tests
mask = df.sum(axis = 1)==2
#use the mask to get subset of rows.
df2 = df[mask]
#now filter for 2 or more specific tests we care about.    
z = df2.loc[(df2['Test1'] == True) & (df2['Test2'] == True), 'Serial' ]
Asked By: Brijesh Sirpatil

||

Answers:

Here is one way:

cols = ['Test1','Test2']
d = df.set_index('Serial').eq('Pass')
d[cols].sum(axis=1).eq(0) & d.sum(axis=1).eq(len(d.columns)-len(cols))

or

df[['Test1','Test2','Test3']].apply(tuple,axis=1).isin([('Fail', 'Fail', 'Pass')])

or

df[['Test1','Test2','Test3']].eq('Fail').sum(axis=1).eq(2) & df['Test3'].eq('Pass')
Answered By: rhug123

You can iterate through the dataframe checking if the columns Test1, Test2, and Test3 are equal to "Fail", "Fail", "Pass", respectively. Then, use loc to get the rows in compliance with that condition. Finally, select the "Serial" column:

df.loc[df.apply(lambda x: np.equal(x.loc["Test1":], ['Fail', 'Fail', 'Pass']).all(), axis=1) == True]["Serial"]

Output:

2    SN-03
Name: Serial, dtype: object
Answered By: Chicodelarose
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.