ValueError: Length of values (1) does not match length of index (11123)
Question:
I’m trying to create a new column on a dataset (csv file) that combines contents of pre-existing columns .
import numpy as np
import pandas as pd
df = pd.read_csv('books.csv', encoding='unicode_escape', error_bad_lines=False)
#List of columns to keep
columns =['title', 'authors', 'publisher']
#Function to combine the columns/features
def combine_features(data):
features = []
for i in range(0, data.shape[0]):
features.append( data['title'][i] +' '+data['authors'][i]+' '+data['publisher'][i])
return features
#Column to store the combined features
df['combined_features'] =combine_features(df)
#Show data
df
I was expecting to find that the new column would be created with the title, author and publisher all in one, however I received the error "ValueError: Length of values (1) does not match length of index (11123)".
To fix this tried to use the command "df.reset_index(inplace=True,drop=True)" which was a suggested solution but that did not work and I am still receiving the same error.
Below is the whole error message:
ValueError Traceback (most recent call last)
<ipython-input-24-40cc76d3cd85> in <module>
1 #Create a column to store the combined features
----> 2 df['combined_features'] =combine_features(df)
3 df
3 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __setitem__(self, key, value)
3610 else:
3611 # set column
-> 3612 self._set_item(key, value)
3613
3614 def _setitem_slice(self, key: slice, value):
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _set_item(self, key, value)
3782 ensure homogeneity.
3783 """
-> 3784 value = self._sanitize_column(value)
3785
3786 if (
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _sanitize_column(self, value)
4507
4508 if is_list_like(value):
-> 4509 com.require_length_match(value, self.index)
4510 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4511
/usr/local/lib/python3.8/dist-packages/pandas/core/common.py in require_length_match(data, index)
529 """
530 if len(data) != len(index):
--> 531 raise ValueError(
532 "Length of values "
533 f"({len(data)}) "
ValueError: Length of values (1) does not match length of index (11123)
Answers:
Surprising that I unable to reproduce the error and the program works as expected for me. Try printing the shape of df and inspect the CSV file!
The reason is the return
statement in the function should not be inside the for loop. Because it is, it returns already after 1 iteration, so the length of values is one, rather than 11123. Unindent the return
once.
I’m trying to create a new column on a dataset (csv file) that combines contents of pre-existing columns .
import numpy as np
import pandas as pd
df = pd.read_csv('books.csv', encoding='unicode_escape', error_bad_lines=False)
#List of columns to keep
columns =['title', 'authors', 'publisher']
#Function to combine the columns/features
def combine_features(data):
features = []
for i in range(0, data.shape[0]):
features.append( data['title'][i] +' '+data['authors'][i]+' '+data['publisher'][i])
return features
#Column to store the combined features
df['combined_features'] =combine_features(df)
#Show data
df
I was expecting to find that the new column would be created with the title, author and publisher all in one, however I received the error "ValueError: Length of values (1) does not match length of index (11123)".
To fix this tried to use the command "df.reset_index(inplace=True,drop=True)" which was a suggested solution but that did not work and I am still receiving the same error.
Below is the whole error message:
ValueError Traceback (most recent call last)
<ipython-input-24-40cc76d3cd85> in <module>
1 #Create a column to store the combined features
----> 2 df['combined_features'] =combine_features(df)
3 df
3 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __setitem__(self, key, value)
3610 else:
3611 # set column
-> 3612 self._set_item(key, value)
3613
3614 def _setitem_slice(self, key: slice, value):
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _set_item(self, key, value)
3782 ensure homogeneity.
3783 """
-> 3784 value = self._sanitize_column(value)
3785
3786 if (
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _sanitize_column(self, value)
4507
4508 if is_list_like(value):
-> 4509 com.require_length_match(value, self.index)
4510 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4511
/usr/local/lib/python3.8/dist-packages/pandas/core/common.py in require_length_match(data, index)
529 """
530 if len(data) != len(index):
--> 531 raise ValueError(
532 "Length of values "
533 f"({len(data)}) "
ValueError: Length of values (1) does not match length of index (11123)
Surprising that I unable to reproduce the error and the program works as expected for me. Try printing the shape of df and inspect the CSV file!
The reason is the return
statement in the function should not be inside the for loop. Because it is, it returns already after 1 iteration, so the length of values is one, rather than 11123. Unindent the return
once.