sklearn-pandas

Indices that KFold split method return for a DataFrame is it iloc or loc?

Indices that KFold split method return for a DataFrame is it iloc or loc? Question: When we use _KFold.split(X) where X is a DataFrame, the indices that gets generated to split data into training and test set, is it iloc (Purely integer-location based indexing for selection by position) or is it loc (loc of group …

Total answers: 1

decision tree repeating class names

decision tree repeating class names Question: I have a very simple sample of data/labels, the problem I’m having is that the decision tree generated (pdf) is repeating the class name: from sklearn import tree from sklearn.externals.six import StringIO import pydotplus features_names = [‘weight’, ‘texture’] features = [[140, 1], [130, 1], [150, 0], [110, 0]] labels …

Total answers: 2

How to one-hot-encode from a pandas column containing a list?

How to one-hot-encode from a pandas column containing a list? Question: I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e. one-hot-encode them (with value 1 representing a given element existing in a row and 0 in the case of …

Total answers: 6

How to do Onehotencoding in Sklearn Pipeline

How to do Onehotencoding in Sklearn Pipeline Question: I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the pandas .get_dummies() function, but I need to use a pipeline so I can generate a PMML-file later on. This …

Total answers: 2

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found

Error when trying to import sklearn modules : ImportError: DLL load failed: The specified module could not be found Question: I tried to do the following importations for a machine learning project: from sklearn import preprocessing, cross_validation, svm from sklearn.linear_model import LinearRegression I got this error message: Traceback (most recent call last): File “C:/Users/Abdelhalim/PycharmProjects/ML/stock pricing.py”, …

Total answers: 8

No module named 'pandas' in Pycharm

No module named &#39;pandas&#39; in Pycharm Question: I read all the topics about, but I cannot solve my problem: Traceback (most recent call last): File "/home/…/…/…/reading_data.py", line 1, in <module> import pandas as pd ImportError: No module named pandas This is my environment: Ubuntu 14.04 Pycharm version: 2016.1.4 Python version: 2.7.10 Pandas version: 0.18.1 Pandas …

Total answers: 5

sklearn stratified sampling based on a column

sklearn stratified sampling based on a column Question: I have a fairly large CSV file containing amazon review data which I read into a pandas data frame. I want to split the data 80-20(train-test) but while doing so I want to ensure that the split data is proportionally representing the values of one column (Categories), …

Total answers: 4