pipeline

Custom Transformer to add additional column

Custom Transformer to add additional column Question: I am trying to replicate my lambda function into my pipeline def determine_healthy(_list): if (‘no’ in _list[‘smoker’] and (_list[‘bmi’] >= 18.5) and (_list[‘bmi’]<= 24.9)): return True else: return False df[‘healthy’] = df.apply(lambda row: determine_healthy(row), axis=1) The problem comes when I am integrating it into my pipeline, I’m not …

Total answers: 1

ColumnTransformer & Pipeline with OHE – Is the OHE encoded field retained or removed after ct is performed?

ColumnTransformer & Pipeline with OHE – Is the OHE encoded field retained or removed after ct is performed? Question: Doc on CT: remainder{‘drop’, ‘passthrough’} or estimator, default=’drop’ By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of ‘drop’). By specifying remainder=’passthrough’, all …

Total answers: 2

Python Question about getting data from a list

Python Question about getting data from a list Question: My Python is not very good and I would like to see if my code makes sense or how can I improve it 🙂 So there is some data coming from an API, the pipeline is bringing this data to Postgres using Python. I didn’t create …

Total answers: 1

how to pass parameter to python script from a pipeline

how to pass parameter to python script from a pipeline Question: I am building an Azure Data Factory pipeline and I would like to know how to get this parameter into the python script. The python script is located in Databricks (DBFS) and is run from Azure DataFactory. So, in my ADF pipeline, I have …

Total answers: 1

How do I add external features to my pipeline?

How do I add external features to my pipeline? Question: There is a similar question asked here on SO many years back but there was no answer. I have the same question. I would like to add in new column(s) of data, in my case 3 columns for dummy variables, to a sparse matrix (from …

Total answers: 1

Sklearn: Is there a way to define a specific score type to pipeline?

Sklearn: Is there a way to define a specific score type to pipeline? Question: I can do this: model=linear_model.LogisticRegression(solver=’lbfgs’,max_iter=10000) kfold = model_selection.KFold(n_splits=number_splits,shuffle=True, random_state=random_state) scalar = StandardScaler() pipeline = Pipeline([(‘transformer’, scalar), (‘estimator’, model)]) results = model_selection.cross_validate(pipeline, X, y, cv=kfold, scoring=score_list,return_train_score=True) where score_list can be something like [‘accuracy’,’balanced_accuracy’,’precision’,’recall’,’f1′]. I also can do this: kfold = model_selection.KFold(n_splits=number_splits,shuffle=True, random_state=random_state) …

Total answers: 1

TransformedTargetRegressor save and load error

TransformedTargetRegressor save and load error Question: I’m defining my custom regressor using the TransformedTargetRegressor, adding it to the pipeline and saving the model in the ‘joblib’ file. However as I’m trying to load the model, I get an error module ‘main‘ has no attribute ‘transform_targets’ where transform_targets is one of the functions defined for the …

Total answers: 1

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer Question: I want to get feature names after I fit the pipeline. categorical_features = [‘brand’, ‘category_name’, ‘sub_category’] categorical_transformer = Pipeline(steps=[ (‘imputer’, SimpleImputer(strategy=’constant’, fill_value=’missing’)), (‘onehot’, OneHotEncoder(handle_unknown=’ignore’))]) numeric_features = [‘num1’, ‘num2’, ‘num3’, ‘num4’] numeric_transformer = Pipeline(steps=[ (‘imputer’, SimpleImputer(strategy=’median’)), (‘scaler’, StandardScaler())]) preprocessor = ColumnTransformer( transformers=[ (‘num’, numeric_transformer, numeric_features), (‘cat’, …

Total answers: 5

Allow duplicate downloads with Scrapy Image Pipeline?

Allow duplicate downloads with Scrapy Image Pipeline? Question: Please see below an example version of my code, which uses the Scrapy Image Pipeline to download/scrape images from a site: import scrapy from scrapy_splash import SplashRequest from imageExtract.items import ImageextractItem class ExtractSpider(scrapy.Spider): name = ‘extract’ start_urls = [‘url’] def parse(self, response): image = ImageextractItem() titles = …

Total answers: 4

How to do Onehotencoding in Sklearn Pipeline

How to do Onehotencoding in Sklearn Pipeline Question: I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the pandas .get_dummies() function, but I need to use a pipeline so I can generate a PMML-file later on. This …

Total answers: 2