LightGBMError: Do not support special JSON characters in feature name – The same code is working in jupyter but doesn't work in Spyder

Question:

I have the following code:

    most_important = features_importance_chi(importance_score_tresh, 
    df_user.drop(columns = 'CHURN'),churn)
    X = df_user.drop(columns = 'CHURN')
    churn[churn==2] = 1
    y = churn

    # handle undersample problem
    X,y = handle_undersampe(X,y)

    # train the model

    X=X.loc[:,X.columns.isin(most_important)].values
    y=y.values

    parameters = {
    'application': 'binary',
    'objective': 'binary',
    'metric': 'auc',
    'is_unbalance': 'true',
    'boosting': 'gbdt',
    'num_leaves': 31,
    'feature_fraction': 0.5,
    'bagging_fraction': 0.5,
    'bagging_freq': 20,
    'learning_rate': 0.05,
    'verbose': 0
    }

    # split data
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

    train_data = lightgbm.Dataset(x_train, label=y_train)
    test_data = lightgbm.Dataset(x_test, label=y_test)
    model = lightgbm.train(parameters,
                       train_data,
                       valid_sets=[train_data, test_data], 
                       **feature_name=most_important,**
                       num_boost_round=5000,
                       early_stopping_rounds=100) 

and function which returns most_important parameter

def features_importance_chi(importance_score_tresh, X, Y):
    model = ExtraTreesClassifier(n_estimators=10)
    model.fit(X,Y.values.ravel())
    feature_list = pd.Series(model.feature_importances_,
                             index=X.columns)
    feature_list = feature_list[feature_list > importance_score_tresh]
    feature_list = feature_list.index.values.tolist()
    return feature_list

Funny thing is that this code in Spyder returns the following error

LightGBMError: Do not support special JSON characters in feature name.

but in jupyter works fine. I am able to print the list of most important features.

Any idea what could be the reason for this error?

Asked By: zdz

||

Answers:

You know what, this message is often found on LGBMClassifier () models, i.e. LGBM.
Simply drop this line at the beginning as soon as you upload the data from the pandas and you have a problem with your head:

import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))

Here is an alternative answer from LightGBM error special JSON characters in feature name #399

# Change columns names ([LightGBM] Do not support special JSON characters in feature name.)
new_names = {col: re.sub(r'[^A-Za-z0-9_]+', '', col) for col in df.columns}
new_n_list = list(new_names.values())
# [LightGBM] Feature appears more than one time.
new_names = {col: f'{new_col}_{i}' if new_col in new_n_list[:i] else new_col for i, (col, new_col) in enumerate(new_names.items())}
df = df.rename(columns=new_names)
Answered By: ah bon

By searching for the problem, it was found that the feature column name was automatically generated because one_hot was used when processing the classification feature.

In fact, there are special characters such as _ or (), so there will be this error.

  1. It can be realized by installing the older version of lightgbm, as follows:

pip install lightgbm==2.2.3 -i
https://pypi.tuna.tsinghua.edu.cn/simple

  1. You can also modify the feature name of the incoming data and so on.
Answered By: ah bon
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.