scikit-learn

Laeble encoding pandas dataframe, same label for same value

Laeble encoding pandas dataframe, same label for same value Question: Here is a snippet of my df: 0 1 2 3 4 5 … 11 12 13 14 15 16 0 BSO PRV BSI TUR WSP ACP … HLR HEX HEX None None None 1 BSO PRV BSI TUR WSP ACP … HLF HLR HEX …

Total answers: 2

Why does adding duplicated features improve Logistic Regression accuracy?

Why does adding duplicated features improve Logistic Regression accuracy? Question: from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression X, y = load_iris(return_X_y=True) for i in range(5): X_redundant = np.c_[X,X[:,:i]] # repeating redundant features print(X_redundant.shape) clf = LogisticRegression(random_state=0,max_iter=1000).fit(X_redundant, y) print(clf.score(X_redundant, y)) Output (150, 4) 0.9733333333333334 (150, 5) 0.98 (150, 6) 0.98 (150, 7) 0.9866666666666667 (150, 8) …

Total answers: 1

How to make 3 circles in python

How to make 3 circles in python Question: So I was given an assignment in which I had to make a graph with 3 circles. I tried to make it by using ‘make_circles’ from sklearn.datsets. I used the code from sklearn.datasets import make_circles X_3 , Y_3 = make_circles(n_samples = 5609 , noise = 0.1 , …

Total answers: 1

How run sklearn.preprocessing.OrdinalEncoder on several columns?

How run sklearn.preprocessing.OrdinalEncoder on several columns? Question: this code raise error: import pandas as pd from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.preprocessing import OrdinalEncoder # Define categorical columns and mapping dictionary categorical_cols = [‘color’, ‘shape’, ‘size’] mapping = {‘red’: 0, ‘green’: 1, ‘blue’: 2, ‘circle’: 0, ‘square’: 1, ‘triangle’: 2, ‘small’: 0, …

Total answers: 2

After dropping columns with missing values, sklearn still throwing ValueError

After dropping columns with missing values, sklearn still throwing ValueError Question: I am currently taking the intermediate machine learning course on kaggle, and am quite new to machine learning. I’m currently trying to create a Random Forest model and implementing OH Encoding on my data, but as it is my first time have been struggling …

Total answers: 2

NaN values created when joining two dataframes

NaN values created when joining two dataframes Question: I am trying to one hot encode data using the sci-kit learn library from, kaggle https://www.kaggle.com/datasets/rkiattisak/salaly-prediction-for-beginer X is a two column dataframe of the age and years of experience columns with the rows containing null values cleaned out with dropna(). My goal is to one hot encode …

Total answers: 1

Data Science Data Analysis

Data Science Data Analysis Question: I have a dataset with people’s characteristics and I need to predict their breakfast here‘s an example of df. And I am training cat boost algorithm for that. Is it possible in my case to predict not only one kind of breakfast, but also an additional one? By additional I …

Total answers: 2

SKLearn Linear Regression on Grouped Pandas Dataframe without aggregation?

SKLearn Linear Regression on Grouped Pandas Dataframe without aggregation? Question: Trying to perform a linear regression over a set of grouped columns and put the coefficient results on each line without performing an aggregations (equivalent to a window function in SQL). I’m banging my head against a wall here. In a for loop this works …

Total answers: 1

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' Question: The code was working before without showing any errors. It’s for a sentimental analysis machine learning project. The code is on logistic regression model for word count: c = CountVectorizer(stop_words = ‘english’) def text_fit(X, y, model,clf_model,coef_show=1): X_c = model.fit_transform(X) print(‘# features: {}’.format(X_c.shape[1])) X_train, X_test, y_train, y_test = …

Total answers: 1

NotFittedError (instance is not fitted yet) after invoked cross_validate

NotFittedError (instance is not fitted yet) after invoked cross_validate Question: This is my minimal reproducible example: x = np.array([ [1, 2], [3, 4], [5, 6], [6, 7] ]) y = [1, 0, 0, 1] model = GaussianNB() scores = cross_validate(model, x, y, cv=2, scoring=("accuracy")) model.predict([8,9]) What I intended to do is instantiating a Gaussian Naive …

Total answers: 1