Data Science Data Analysis

Question:

I have a dataset with people’s characteristics and I need to predict their breakfast here‘s an example of df.

And I am training cat boost algorithm for that.

Is it possible in my case to predict not only one kind of breakfast, but also an additional one?

By additional I mean the second most appealing type of breakfast for a person.

#I started with this:

df_train, df_test = train_test_split(df, test_size=0.15, random_state=42)

df_train, df_valid = train_test_split(df_train, test_size=0.15, random_state=42)

features_train = df_train.drop(['breakfast'], axis=1)

target_train = df_train['breakfast']

features_valid = df_valid.drop(['breakfast'], axis=1)

target_valid = df_valid['breakfast']

features_test = df_test.drop(['breakfast'], axis=1)

target_test = df_test['breakfast']

model_cat = CatBoostClassifier(random_state=42)

model_cat.fit(features_train, target_train)

valid_predictions_tree = model_cat.predict(features_valid)

#But this is supposed to train for a single categorical variable output, however I need not one but two best results.

Answers:

Using predict_proba instead will return the probability for every class of your target:

valid_predictions_tree = model_cat.predict_proba(features_valid)

To get clean predictions for an input dt you can do this:

proba = pd.DataFrame(model_cat.predict_proba(dt), columns=model_cat.classes_)

Output example:

Class1   Class2   Class3
0.2      0.5      0.3
0.7      0.2      0.1

The total for each line is 1 (100%).

Answered By: Mattravel

You could use the predict_proba method to get the 1st and 2nd most probable predictions from the classifier.

Answered By: Carlos Melus