ValueError: Length of values (1) does not match length of index (11123)


I’m trying to create a new column on a dataset (csv file) that combines contents of pre-existing columns .

import numpy as np
import pandas as pd

df = pd.read_csv('books.csv', encoding='unicode_escape', error_bad_lines=False)

#List of columns to keep
columns =['title', 'authors', 'publisher']

#Function to combine the columns/features
def combine_features(data):
  features = []
  for i in range(0, data.shape[0]):
    features.append( data['title'][i] +' '+data['authors'][i]+' '+data['publisher'][i])
    return features

#Column to store the combined features
df['combined_features'] =combine_features(df)

#Show data

I was expecting to find that the new column would be created with the title, author and publisher all in one, however I received the error "ValueError: Length of values (1) does not match length of index (11123)".

To fix this tried to use the command "df.reset_index(inplace=True,drop=True)" which was a suggested solution but that did not work and I am still receiving the same error.

Below is the whole error message:

ValueError                                Traceback (most recent call last)
<ipython-input-24-40cc76d3cd85> in <module>
      1 #Create a column to store the combined features
----> 2 df['combined_features'] =combine_features(df)
      3 df

3 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/ in __setitem__(self, key, value)
   3610         else:
   3611             # set column
-> 3612             self._set_item(key, value)
   3614     def _setitem_slice(self, key: slice, value):

/usr/local/lib/python3.8/dist-packages/pandas/core/ in _set_item(self, key, value)
   3782         ensure homogeneity.
   3783         """
-> 3784         value = self._sanitize_column(value)
   3786         if (

/usr/local/lib/python3.8/dist-packages/pandas/core/ in _sanitize_column(self, value)
   4508         if is_list_like(value):
-> 4509             com.require_length_match(value, self.index)
   4510         return sanitize_array(value, self.index, copy=True, allow_2d=True)

/usr/local/lib/python3.8/dist-packages/pandas/core/ in require_length_match(data, index)
    529     """
    530     if len(data) != len(index):
--> 531         raise ValueError(
    532             "Length of values "
    533             f"({len(data)}) "

ValueError: Length of values (1) does not match length of index (11123)
Asked By: mudgey



Surprising that I unable to reproduce the error and the program works as expected for me. Try printing the shape of df and inspect the CSV file!

Answered By: user31934

The reason is the return statement in the function should not be inside the for loop. Because it is, it returns already after 1 iteration, so the length of values is one, rather than 11123. Unindent the return once.

Answered By: juanpethes
