Got continuous is not supported error in RandomForestRegressor
Question:
I’m just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in accuracy_score(y_true, y_pred, normalize, sample_weight)
177
178 # Compute accuracy for each possible representation
–> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
180 if y_type.startswith(‘multilabel’):
181 differing_labels = count_nonzero(y_true – y_pred, axis=1)
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in _check_targets(y_true, y_pred)
90 if (y_type not in [“binary”, “multiclass”, “multilabel-indicator”,
91 “multilabel-sequences”]):
—> 92 raise ValueError(“{0} is not supported”.format(y_type))
93
94 if y_type in [“binary”, “multiclass”]:
ValueError: continuous is not supported
This is the sample of the data. I can’t show the real data.
target, func_1, func_2, func_2, ... func_200
float, float, float, float, ... float
Here’s my code.
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
train = pd.read_csv('data.txt', sep='t')
labels = train.target
train.drop('target', axis=1, inplace=True)
cat = ['cat']
train_cat = pd.get_dummies(train[cat])
train.drop(train[cat], axis=1, inplace=True)
train = np.hstack((train, train_cat))
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(train)
train = imp.transform(train)
x_train, x_test, y_train, y_test = train_test_split(train, labels.values, test_size = 0.2)
clf = RandomForestRegressor(n_estimators=10)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
accuracy_score(y_test, y_pred) # This is where I get the error.
Answers:
It’s because accuracy_score is for classification tasks only.
For regression you should use something different, for example:
clf.score(X_test, y_test)
Where X_test is samples, y_test is corresponding ground truth values. It will compute predictions inside.
Since you are doing a regression task, you should be using the metric R-squared (co-effecient of determination) instead of
accuracy score (accuracy score is used for classification problems).
R-squared can be computed by calling score function provided by RandomForestRegressor, for example:
rfr.score(X_test,Y_test)
try
tree_clf.score(x_train, y_train)
you can’t use a confusion matrix either
I’m just trying to do a simple RandomForestRegressor example. But while testing the accuracy I get this error
/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in accuracy_score(y_true, y_pred, normalize, sample_weight)
177
178 # Compute accuracy for each possible representation
–> 179 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
180 if y_type.startswith(‘multilabel’):
181 differing_labels = count_nonzero(y_true – y_pred, axis=1)/Users/noppanit/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.pyc
in _check_targets(y_true, y_pred)
90 if (y_type not in [“binary”, “multiclass”, “multilabel-indicator”,
91 “multilabel-sequences”]):
—> 92 raise ValueError(“{0} is not supported”.format(y_type))
93
94 if y_type in [“binary”, “multiclass”]:ValueError: continuous is not supported
This is the sample of the data. I can’t show the real data.
target, func_1, func_2, func_2, ... func_200
float, float, float, float, ... float
Here’s my code.
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
train = pd.read_csv('data.txt', sep='t')
labels = train.target
train.drop('target', axis=1, inplace=True)
cat = ['cat']
train_cat = pd.get_dummies(train[cat])
train.drop(train[cat], axis=1, inplace=True)
train = np.hstack((train, train_cat))
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(train)
train = imp.transform(train)
x_train, x_test, y_train, y_test = train_test_split(train, labels.values, test_size = 0.2)
clf = RandomForestRegressor(n_estimators=10)
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)
accuracy_score(y_test, y_pred) # This is where I get the error.
It’s because accuracy_score is for classification tasks only.
For regression you should use something different, for example:
clf.score(X_test, y_test)
Where X_test is samples, y_test is corresponding ground truth values. It will compute predictions inside.
Since you are doing a regression task, you should be using the metric R-squared (co-effecient of determination) instead of
accuracy score (accuracy score is used for classification problems).
R-squared can be computed by calling score function provided by RandomForestRegressor, for example:
rfr.score(X_test,Y_test)
try
tree_clf.score(x_train, y_train)
you can’t use a confusion matrix either