LGBMRanking Model "Found input variables with inconsistent numbers of samples"

Question:

X and Y = 44980

group = 3

Data is time series, X contain features + the item being ranked

Date Item Feature
9/27 1 1
9/27 2 1
9/27 3 1
9/28 1 0
9/28 2 0
9/28 3 0

y contains the rank of the item

Date Rank
9/27 3
9/27 2
9/27 1
9/28 2
9/28 3
9/28 1

But when running LGBM Ranker on the following data, I get the following error

Traceback (most recent call last):
  File "code.py", line 62, in <module>
    score = cross_val_score(model, X=X, y=y,
  File "sklearnmodel_selection_validation.py", line 515, in cross_val_score
    cv_results = cross_validate(
  File "sklearnmodel_selection_validation.py", line 252, in cross_validate
    X, y, groups = indexable(X, y, groups)
  File "sklearnutilsvalidation.py", line 429, in indexable
    check_consistent_length(*result)
  File "sklearnutilsvalidation.py", line 383, in check_consistent_length
    raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [44980, 44980, 3]

Code:

paths_dict = {'1':'../../1.csv',
              '2':'../../2.csv',
              '3':'../../3.csv',}
def load_paths(paths_dict):
  df = pd.DataFrame()
  for key, value in paths_dict.items():
    df[key] = pd.read_csv(value, index_col=0, parse_dates=True)['Close']
  df = df.iloc[::-1]
  return df
df = load_paths(paths_dict)
df = df.stack().reset_index()
df.columns = ['Date', 'Item', 'Target']
df['Item'] = df['Item'].astype('int')
df['Target'] = df.groupby('Date')['Target'].rank('dense', ascending=False).astype(int)
df.set_index('Date', inplace=True)

y = df['Target']
X = df.drop(['Target'], axis=1)
model = LGBMRanker(n_jobs=-1)
score = cross_val_score(model, X=X, y=y,
                        groups=X.groupby('Item'),
                        cv=TimeSeriesSplit(n_splits=24),
                        scoring=make_scorer(ndcg_score))
Asked By: Tomward Matthias

||

Answers:

sklearn doesnt support ranking models

Answered By: Tomward Matthias
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.