how to apply mutual information on categorical features


I am using Scikit-learn to train a classification model. I have both discrete and continuous features in my training data.

I want to do feature selection using mutual information.

The features 1,2 and 3 are discrete. to this end, I try the code below :

mutual_info_classif(x, y, discrete_features=[1, 2, 3])

but it did not work, it gives me the error:

 ValueError: could not convert string to float: 'INT'
Asked By: samira



A simple example with mutual information classifier:

import numpy as np
from sklearn.feature_selection import mutual_info_classif
X = np.array([[0, 0, 0],
              [1, 1, 0],
              [2, 0, 1],
              [2, 0, 1],
              [2, 0, 1]])
y = np.array([0, 1, 2, 2, 1])
mutual_info_classif(X, y, discrete_features=True)
# result: array([ 0.67301167,  0.22314355,  0.39575279]
Answered By: silgon

.There is a difference between ‘discrete’ and ‘categorical’
In this case, function demands the data to be numerical. May be you can use label encoder if you have ordinal features. Else you would have to use one hot encoding for nominal features. You can use pd.get_dummies for this purpose.

Answered By: Parul Singh

mutual_info_classif can only take numeric data. You need to do label encoding of the categorical features and then run the same code.


Then run the exact same code you were running.

mutual_info_classif(x1, y, discrete_features=[1, 2, 3])
Answered By: Jatin

Mutual infomation calculates the shared information, where ordering does not matter. With that being said, it should not matter if categorical data is ordered or not in order to label-encode it.

So to answer the question:

Categorical values (like "udp","-","INT" which you mentioned in your comment) can be label-encoded in order to calculate the mutual information, even though sklearn recommends not to use LabelEncoder on features. Of course, you can dummy-code or one-hot-code the categorical features, but you lose the ability to look at the mutual information of the variable as a whole.

Answered By: Jan