New dataframe column using existing column names as values

Question:

I have the following pandas dataframe:

ID Class LR XG SV BEST_R2
1 Class1 .76 .78 .99 .99
2 Class2 .92 .89 .91 .92
3 Class3 .87 .95 .87 .95

This is a dataframe with the R2 of each of a series of machine learning models (LR/XG/SV) for each ID. The column "BEST_R2" represents the best R2 score for that ID across models (.max(axis=1)). I need another column with the model name for best score. For example, the dataframe below. Any tips on how to achieve this programmatically?

ID Class LR XG SV BEST_R2 BEST MODEL
1 Class1 .76 .78 .99 .99 SV
2 Class2 .92 .89 .91 .92 LR
3 Class3 .87 .95 .87 .95 XG
Asked By: Dr.Data

||

Answers:

Assuming that ID is the index, you can do

df["Best Model"] = df.idxmax(axis=1)

Result:

      LR    XG    SV  BEST_R2 Best Model
ID                                      
1   0.76  0.78  0.99     0.99         SV
2   0.92  0.89  0.91     0.92         LR
3   0.87  0.95  0.87     0.95         XG
Answered By: fsimonjetz

Here is a way to do it:

best_models = []
for row in df.itertuples():
    tuples = sorted([('LR',row.LR), ('XG',row.XG), ('SV',row.SV)],reverse = True, key = lambda x:x[1])
    best_models.append(tuples[0][0])
df['Best Models'] = best_models
df

enter image description here

Answered By: flbyrne
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.