How to retrieve rows based on mismatched condition on particular columns?

Question:

I need to do the following tasks.

I have 9 columns along with the original label. Each of those 9 columns consists of a probability value. Each 3 value is a prediction by a particular model. I have a total of 3 classifier models and there are 3 classes.

Now I have to apply the max rule.

For each class I have to pick the max probability this will give me three max values. Now I will finally return to the class which is maxed among those 3.

My code and sample

enter image description here

import numpy as np
df['Covid_max'] = np.where(df.columns == 'Covid',df.values,0).max(axis=1)
df['Normal_max'] = np.where(df.columns == 'Normal',df.values,0).max(axis=1)
df['Pneumonia_max'] = np.where(df.columns == 'Pneumonia',df.values,0).max(axis=1)

enter image description here

df['pred'] = df[['Covid_max','Normal_max','Pneumonia_max']].idxmax(axis=1)
new_label = {"pred": {"Covid_max": 0, "Normal_max": 1,"Pneumonia_max": 2,}}
df.replace(new_label , inplace = True)

enter image description here

Upto I have done it already. Now I got stuck. I only require the records where there is a mismatch between class and pred columns.(Here it should only print the 2nd row) How to do that?

Also, if anybody gives another solution, I would be happy to grasp that.

TIA

Asked By: XYZ

||

Answers:

Try this.

df_mismatch = df.loc[~(df['Class'] == df['pred'])]
Answered By: Amartya
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.