text-classification

Get data from .pickle

Get data from .pickle Question: I have a model of Multinomial NB(): text_clf_NB = Pipeline([(‘vect’, CountVectorizer()), (‘tfidf’, TfidfTransformer()), (‘clf’, MultinomialNB()), ]) text_clf_NB.fit(Train_X_NB, Train_Y_NB) I save it to .pickle pickle.dump(text_clf_NB, open("NB_classification.pickle", "wb")) In another case I load this model: clf = pickle.load(open("NB_classification.pickle", "rb")) Can you help me, please, how can I get sparse matrix of Train …

Total answers: 1

why smote raise "Found input variables with inconsistent numbers of samples"?

why smote raise "Found input variables with inconsistent numbers of samples"? Question: I try to classify emotion from tweet with dataset of 4401 tweet, when i use smaller sample of data (around 15 tweet) everything just work fine, but when i use the full dataset it raise the error of Found input variables with inconsistent …

Total answers: 2

Cosine similarity of two columns in a DataFrame

Cosine similarity of two columns in a DataFrame Question: I’ve a dataframe with 2 columns and I am tring to get a cosine similarity score of each pair of sentences. Dataframe (df) A B 0 Lorem ipsum ta lorem ipsum 1 Excepteur sint occaecat excepteur 2 Duis aute irure aute irure some of the code …

Total answers: 1

Save TextVectorization Model to load it later

Save TextVectorization Model to load it later Question: I’m not used to the TextVectorization Encoder Layer. I created my vocabulary manually before. I was wondering how one can save a Keras Model which uses the TextVectorization layer. When I tried to do it with simply model.save() and later models.load_model() I was prompted with this error: …

Total answers: 1

ValueError: X has 3 features, but LinearSVC is expecting 64852 features as input

ValueError: X has 3 features, but LinearSVC is expecting 64852 features as input Question: I get the following error when I try to deploy this model. ValueError: X has 3 features, but LinearSVC is expecting 64852 features as input Example of data below. data = [[3409, False, ‘Lorum Ipsum’], [0409, True, ‘dolor sit amet consectetuer’], …

Total answers: 1

Can't backward pass two losses in Classification Transformer Model

Can't backward pass two losses in Classification Transformer Model Question: For my model I’m using a roberta transformer model and the Trainer from the Huggingface transformer library. I calculate two losses: lloss is a Cross Entropy Loss and dloss calculates the loss inbetween hierarchy layers. The total loss is the sum of lloss and dloss. …

Total answers: 2

How to make text classification gives a None category

How to make text classification gives a None category Question: I’m doing text classification for dialects. After I trained it for 3 types of dialects, I tested it with the test data I have. However, now suppose I’m going to extract a tweet from twitter, and ask the classifier to output the corresponding dialect, but …

Total answers: 2

Sklearn Pipeline ValueError: could not convert string to float

Sklearn Pipeline ValueError: could not convert string to float Question: I’m playing around with sklearn and NLP for the first time, and thought I understood everything I was doing up until I didn’t know how to fix this error. Here is the relevant code (largely adapted from http://zacstewart.com/2015/04/28/document-classification-with-scikit-learn.html): from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.pipeline import …

Total answers: 2

Information Gain calculation with Scikit-learn

Information Gain calculation with Scikit-learn Question: I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix. the Information Gain is defined as H(Class) – H(Class | Attribute), where H is the entropy. in weka, this would be calculated …

Total answers: 3

ROC for multiclass classification

ROC for multiclass classification Question: I’m doing different text classification experiments. Now I need to calculate the AUC-ROC for each task. For the binary classifications, I already made it work with this code: scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) feat_sel = SelectKBest(mutual_info_classif, k=200) clf = linear_model.LogisticRegression() pipe = Pipeline([(‘vectorizer’, DictVectorizer()), (‘scaler’, StandardScaler(with_mean=False)), …

Total answers: 4