random-forest

Using the predict_proba() function of RandomForestClassifier in the safe and right way

Using the predict_proba() function of RandomForestClassifier in the safe and right way Question: I’m using Scikit-learn. Sometimes I need to have the probabilities of labels/classes instead of the labels/classes themselves. Instead of having Spam/Not Spam as labels of emails, I wish to have only for example: 0.78 probability a given email is Spam. For such …

Total answers: 3

RandomForestClassfier.fit(): ValueError: could not convert string to float

RandomForestClassfier.fit(): ValueError: could not convert string to float Question: Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I’m attempting to build a random forest classifier for it, like so: cols = [‘A’,’B’,’C’] col_types = {‘A’: str, ‘B’: str, …

Total answers: 8

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit) Question: I’m running GridSearch CV to optimize the parameters of a classifier in scikit. Once I’m done, I’d like to know which parameters were chosen as the best. Whenever I do so I get a AttributeError: ‘RandomForestClassifier’ object has no attribute ‘best_estimator_’, and can’t …

Total answers: 2

RandomForestClassifier import

RandomForestClassifier import Question: I’ve installed Anaconda Python distribution with scikit-learn. While importing RandomForestClassifier: from sklearn.ensemble import RandomForestClassifier I have the following error: File “C:Anacondalibsite-packagessklearntreetree.py”, line 36, in <module> from . import _tree ImportError: cannot import name _tree What the problem can be there? Asked By: Ilya Zinkovich || Source Answers: The problem was that I …

Total answers: 3

Using GridSearchCV for RandomForestRegressor

Using GridSearchCV for RandomForestRegressor Question: I’m trying to use GridSearchCV for RandomForestRegressor, but always get ValueError: Found array with dim 100. Expected 500. Consider this toy example: import numpy as np from sklearn import ensemble from sklearn.cross_validation import train_test_split from sklearn.grid_search import GridSearchCV from sklearn.metrics import r2_score if __name__ == ‘__main__’: X = np.random.rand(1000, 2) …

Total answers: 2

Can sklearn random forest directly handle categorical features?

Can sklearn random forest directly handle categorical features? Question: Say I have a categorical feature, color, which takes the values [‘red’, ‘blue’, ‘green’, ‘orange’], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell sklearn …

Total answers: 6

Save python random forest model to file

Save python random forest model to file Question: In R, after running “random forest” model, I can use save.image(“***.RData”) to store the model. Afterwards, I can just load the model to do predictions directly. Can you do a similar thing in python? I separate the Model and Prediction into two files. And in Model file: …

Total answers: 5

How to extract the decision rules from scikit-learn decision-tree?

How to extract the decision rules from scikit-learn decision-tree? Question: Can I extract the underlying decision-rules (or ‘decision paths’) from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class=’X’ Asked By: Dror Hilman || Source Answers: from StringIO import StringIO out …

Total answers: 25

Unbalanced classification using RandomForestClassifier in sklearn

Unbalanced classification using RandomForestClassifier in sklearn Question: I have a dataset where the classes are unbalanced. The classes are either ‘1’ or ‘0’ where the ratio of class ‘1’:’0′ is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random Forest, kind of like in …

Total answers: 4

Numpy Array Get row index searching by a row

Numpy Array Get row index searching by a row Question: I am new to numpy and I am implementing clustering with random forest in python. My question is: How could I find the index of the exact row in an array? For example [[ 0. 5. 2.] [ 0. 0. 3.] [ 0. 0. 0.]] …

Total answers: 2