one-hot-encoding

Using Scikit-Learn OneHotEncoder with a Pandas DataFrame

Using Scikit-Learn OneHotEncoder with a Pandas DataFrame Question: I’m trying to replace a column within a Pandas DataFrame containing strings into a one-hot encoded equivalent using Scikit-Learn’s OneHotEncoder. My code below doesn’t work: from sklearn.preprocessing import OneHotEncoder # data is a Pandas DataFrame jobs_encoder = OneHotEncoder() jobs_encoder.fit(data[‘Profession’].unique().reshape(1, -1)) data[‘Profession’] = jobs_encoder.transform(data[‘Profession’].to_numpy().reshape(-1, 1)) It produces the …

Total answers: 5

One hot encoding of multi label images in keras

One hot encoding of multi label images in keras Question: I am using PASCAL VOC 2012 dataset for image classification. A few images have multiple labels where as a few of them have single labels as shown below. 0 2007_000027.jpg {‘person’} 1 2007_000032.jpg {‘aeroplane’, ‘person’} 2 2007_000033.jpg {‘aeroplane’} 3 2007_000039.jpg {‘tvmonitor’} 4 2007_000042.jpg {‘train’} I …

Total answers: 1

One-hot-encoding multiple columns in sklearn and naming columns

One-hot-encoding multiple columns in sklearn and naming columns Question: I have the following code to one-hot-encode 2 columns I have. # encode city labels using one-hot encoding scheme city_ohe = OneHotEncoder(categories=’auto’) city_feature_arr = city_ohe.fit_transform(df[[‘city’]]).toarray() city_feature_labels = city_ohe.categories_ city_features = pd.DataFrame(city_feature_arr, columns=city_feature_labels) phone_ohe = OneHotEncoder(categories=’auto’) phone_feature_arr = phone_ohe.fit_transform(df[[‘phone’]]).toarray() phone_feature_labels = phone_ohe.categories_ phone_features = pd.DataFrame(phone_feature_arr, columns=phone_feature_labels) What …

Total answers: 4

Logistic regression on One-hot encoding

Logistic regression on One-hot encoding Question: I have a Dataframe (data) for which the head looks like the following: status datetime country amount city 601766 received 1.453916e+09 France 4.5 Paris 669244 received 1.454109e+09 Italy 6.9 Naples I would like to predict the status given datetime, country, amount and city Since status, country, city are string, …

Total answers: 3

How to interpret results of Spark OneHotEncoder

How to interpret results of Spark OneHotEncoder Question: I read the OHE entry from Spark docs, One-hot encoding maps a column of label indices to a column of binary vectors, with at most a single one-value. This encoding allows algorithms which expect continuous features, such as Logistic Regression, to use categorical features. but sadly they …

Total answers: 1

One Hot Encoding using numpy

One Hot Encoding using numpy Question: If the input is zero I want to make an array which looks like this: [1,0,0,0,0,0,0,0,0,0] and if the input is 5: [0,0,0,0,0,1,0,0,0,0] For the above I wrote: np.put(np.zeros(10),5,1) but it did not work. Is there any way in which, this can be implemented in one line? Asked By: …

Total answers: 9

Convert array of indices to 1-hot encoded numpy array

Convert array of indices to one-hot encoded array in NumPy Question: Given a 1D array of indices: a = array([1, 0, 3]) I want to one-hot encode this as a 2D array: b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) Asked By: James Atwood || Source Answers: Create a zeroed array b with enough columns, i.e. a.max() + …

Total answers: 22

Can sklearn random forest directly handle categorical features?

Can sklearn random forest directly handle categorical features? Question: Say I have a categorical feature, color, which takes the values [‘red’, ‘blue’, ‘green’, ‘orange’], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell sklearn …

Total answers: 6

Running get_dummies on several DataFrame columns?

Running get_dummies on several DataFrame columns? Question: How can one idiomatically run a function like get_dummies, which expects a single column and returns several, on multiple DataFrame columns? Asked By: Emre || Source Answers: Since pandas version 0.15.0, pd.get_dummies can handle a DataFrame directly (before that, it could only handle a single Series, and see …

Total answers: 5

adding dummy columns to the original dataframe

Adding dummy columns to the original dataframe Question: I have a dataframe looks like this: EXEC_FULLNAME YEAR BECAMECEO CO_PER_ROL 5622 Ira A. Eichner 1992 19550101 5622 Ira A. Eichner 1993 19550101 5622 Ira A. Eichner 1994 19550101 5623 David P. Storch 1994 19961009 5623 David P. Storch 1995 19961009 5623 David P. Storch 1996 19961009 …

Total answers: 2