How to get feature names corresponding to scores for chi square feature selection in scikit

Question:

I am using Scikit for feature selection, but I want to get the score values for all the unigrams in the text. I get the scores, but I how do I map these to actual feature names.

from sklearn.feature_extraction.text  import CountVectorizer
from sklearn.feature_selection import  SelectKBest, chi2

Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_

This gives the results, but how do I know which feature names maps to what scores?

Asked By: AMisra

||

Answers:

It’s right there in the documentation:

get_feature_names()

Answered By: cfh

To print the feature name at the initial select all features in chi-square then match it with your columns and as per out of p-value you can remove the feature.

from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

X = df.drop("outcome",axis=1)
y = df["outcome"]

chi_scores = chi2(X,y)

chi_scores

p_values = pd.Series(chi_scores[1],index = X.columns)
p_values.sort_values(ascending = False , inplace = True)

p_values.plot.bar(figsize=(20,10))

print(p_values>=0.5)

Answered By: Pankaj Kumar Yadav
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.