New dataframe column using existing column names as values

Question

I have the following pandas dataframe:

ID	Class	LR	XG	SV	BEST_R2
1	Class1	.76	.78	.99	.99
2	Class2	.92	.89	.91	.92
3	Class3	.87	.95	.87	.95

This is a dataframe with the R2 of each of a series of machine learning models (LR/XG/SV) for each ID. The column "BEST_R2" represents the best R2 score for that ID across models (.max(axis=1)). I need another column with the model name for best score. For example, the dataframe below. Any tips on how to achieve this programmatically?

ID	Class	LR	XG	SV	BEST_R2	BEST MODEL
1	Class1	.76	.78	.99	.99	SV
2	Class2	.92	.89	.91	.92	LR
3	Class3	.87	.95	.87	.95	XG

Asked By: Dr.Data

||

Source

Answer 1

Assuming that ID is the index, you can do

df["Best Model"] = df.idxmax(axis=1)

Result:

      LR    XG    SV  BEST_R2 Best Model
ID                                      
1   0.76  0.78  0.99     0.99         SV
2   0.92  0.89  0.91     0.92         LR
3   0.87  0.95  0.87     0.95         XG

Answered By: fsimonjetz

Answer 2

Here is a way to do it:

best_models = []
for row in df.itertuples():
    tuples = sorted([('LR',row.LR), ('XG',row.XG), ('SV',row.SV)],reverse = True, key = lambda x:x[1])
    best_models.append(tuples[0][0])
df['Best Models'] = best_models
df

Answered By: flbyrne

New dataframe column using existing column names as values

Question:

Answers: