AttributeError: 'DecisionTreeClassifier' object has no attribute 'feature_names_in_'

Question

I want to implement a decision tree for a dataset, and I am just a beginner in this field. But after I run the function, I get the error:

AttributeError: ‘DecisionTreeClassifier’ object has no attribute
‘feature_names_in_’

Although based on this link, this attribute can be called over DecisionTreeClassifier() objects.

And here is my function, and the packages which I have installed:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.tree import export_graphviz 
from IPython.display import Image  
from sklearn import tree
import gdown
from graphviz import Source
from sklearn.tree import export_graphviz

def decision_tree(data):
  X = data.drop(['VendorID', 'VendorID_zscore', 'VendorID_boxwhiskerscore', 'VendorID_normalized',
                 'VendorID_zscore_normalized', 'VendorID_boxwhiskerscore_normalized', 'cluster'], axis=1)
  y = data['cluster']

  X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size = 0.25, random_state= 0)

  sc_X = StandardScaler()
  X_train = sc_X.fit_transform(X_train)
  X_test = sc_X.transform(X_test)

  classifier = DecisionTreeClassifier()
  classifier.fit(X_train, y_train)

  #  Prediction
  y_pred = classifier.predict(X_test) #Accuracy
  print('Accuracy Score:', accuracy_score(y_test,y_pred))

  #  Confusion Matrix
  cm = confusion_matrix(y_test, y_pred)
  print('Confusion Matrix: ', cm)

  # visualization
  export_graphviz(
        classifier,
        out_file="tree.dot",
        feature_names = classifier.feature_names_in_,
        class_names=['cluster'],
        rounded=True,
        filled=True
    )

Here is the whole error in my Jupiter notebook terminal:

<ipython-input-19-51196bcefa11> in decision_tree(data)
     25         classifier,
     26         out_file="tree.dot",
---> 27         feature_names = classifier.feature_names_in_,
     28         class_names=['cluster'],
     29         rounded=True,

AttributeError: 'DecisionTreeClassifier' object has no attribute 'feature_names_in_'

Edit:

I have tried the plot_tree library, and I can save the tree in a file and render it in the terminal. Here is my new code:

def decision_tree(data):
  X = data.drop(['VendorID', 'VendorID_zscore', 'VendorID_boxwhiskerscore', 'VendorID_normalized',
                 'VendorID_zscore_normalized', 'VendorID_boxwhiskerscore_normalized', 'cluster'], axis=1)
  y = data['cluster']

  X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size = 0.25, random_state= 0)

  sc_X = StandardScaler()
  X_train = sc_X.fit_transform(X_train)
  X_test = sc_X.transform(X_test)

  plt.figure(dpi=1200, figsize=(8, 6))
  classifier = DecisionTreeClassifier().fit(X_train, y_train)
  plot_tree(classifier, filled=True, max_depth=4)
  plt.title("Decision tree trained on all the NYC Taxi Trips features")
  plt.savefig('decision_tree.png', dpi=1200)
  plt.show()
  #  Prediction
  y_pred = classifier.predict(X_test) #Accuracy
  print('Accuracy Score:', accuracy_score(y_test,y_pred))

  #  Confusion Matrix
  cm = confusion_matrix(y_test, y_pred)
  print('Confusion Matrix: ', cm)

Asked By: Aylin Naebzadeh

||

Source

Answer 1

I solved the problem with some little changes.

def decision_tree(data):

  X =data.drop(['VendorID', 'VendorID_zscore', 'VendorID_boxwhiskerscore', 'VendorID_normalized',
                 'VendorID_zscore_normalized', 'VendorID_boxwhiskerscore_normalized', 'cluster'], axis=1)
  
  y = data['cluster']

  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

  # fill NaN annd infinite
  X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)
  X_test = X_test.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)

  classifier = DecisionTreeClassifier(max_depth=5)
  classifier.fit(X_train, y_train)

  y_pred = classifier.predict(X_test)

  # validation
  print(confusion_matrix(y_test, y_pred))
  print(classification_report(y_test, y_pred))

  # visualization
  export_graphviz(
        classifier,
        out_file="tree.dot",
        feature_names = classifier.feature_names_in_,
        class_names=["0", "1", "2", "3"],
        rounded=True,
        filled=True
    )
  
  view = Source.from_file("tree.dot")
  view.render('tree', format='jpg',view=True)
  view.view()

Answered By: Aylin Naebzadeh

AttributeError: 'DecisionTreeClassifier' object has no attribute 'feature_names_in_'

Question:

Answers: