Can you passthrough a specific column in a scikit-learn ColumnTransformer?

Question:

I have a fairly large datframe(300 columns) and I’m using sklearn to encode/scale some fields, I like that I can choose the specific columns I want and then it drop the rest. My problem is, now I have two numpy arrays in two columns in my large data frame that I would like passed through while the others I don’t list in the sklearn pipeline are dropped.

For example:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [1])], remainder = 'passthrough')

This would convert the country to onehot and pass through everything. What if I have a column called "numpy_array" how can I get that one only passed through?

Asked By: Lostsoul

||

Answers:

What if I have a column called "numpy_array" how can I get that one only passed through?

from sklearn.compose import ColumnTransformer

ct = ColumnTransformer(
    transformers=[
        ('np_array_transform', 'passthrough', ['numpy_array']),
    ],
    remainder='drop',
)
Answered By: Sanjar Adilov
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.