Must have equal len keys and value when setting with an iterable
Question:
I have two dataframes as follows:
leader
:
0 11
1 8
2 5
3 9
4 8
5 6
[6065 rows x 2 columns]
DatasetLabel
:
0 1 .... 7 8 9 10 11 12
0 A J .... 1 2 5 NaN NaN NaN
1 B K .... 3 4 NaN NaN NaN NaN
[4095 rows x 14 columns]
The Information dataset column names 0 to 6 are DatasetLabel
about data and 7 to 12 are indexes that refer to the first column of leader
Dataframe.
I want to create dataset where instead of the indexes in DatasetLabel
dataframe, I have the value of each index from the leader
dataframe, which is leader.iloc[index,1]
.
How can I do it using python features?
The output should look like:
DatasetLabel
:
0 1 .... 7 8 9 10 11 12
0 A J .... 8 5 6 NaN NaN NaN
1 B K .... 9 8 NaN NaN NaN NaN
I have come up with the following, but I get an error:
for column in DatasetLabel.ix[:, 8:13]:
DatasetLabel[DatasetLabel[column].notnull()] = leader.iloc[DatasetLabel[DatasetLabel[column].notnull()][column].values, 1]
Error:
ValueError: Must have equal len keys and value when setting with an iterable
Answers:
You can use apply
to index into leader
and exchange values with DatasetLabel
, although it’s not very pretty.
One issue is that Pandas won’t let us index with NaN
. Converting to str
provides a workaround. But that creates a second issue, namely, column 9
is of type float
(because NaN
is float
), so 5
becomes 5.0
. Once it’s a string, that’s "5.0"
, which will fail to match the index values in leader
. We can remove the .0
, and then this solution will work – but it’s a bit of a hack.
With DatasetLabel
as:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 1 2 5.0 NaN NaN NaN
1 1 B K 3 4 NaN NaN NaN NaN
And leader
as:
0 1
0 0 11
1 1 8
2 2 5
3 3 9
4 4 8
5 5 6
Then:
cols = ["7","8","9","10","11","12"]
updated = DatasetLabel[cols].apply(
lambda x: leader.loc[x.astype(str).str.split(".").str[0], 1].values, axis=1)
updated
7 8 9 10 11 12
0 8.0 5.0 6.0 NaN NaN NaN
1 9.0 8.0 NaN NaN NaN NaN
Now we can concat
the unmodified columns (which we’ll call original
) with updated
:
original_cols = DatasetLabel.columns[~DatasetLabel.columns.isin(cols)]
original = DatasetLabel[original_cols]
pd.concat([original, updated], axis=1)
Output:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 8.0 5.0 6.0 NaN NaN NaN
1 1 B K 9.0 8.0 NaN NaN NaN NaN
Note: It may be clearer to use concat
here, but here’s another, cleaner way of merging original
and updated
, using assign
:
DatasetLabel.assign(**updated)
The source code shows that this error occurs when you try to broadcast a list-like object (numpy array, list, set, tuple etc.) to multiple columns or rows but didn’t specify the index correctly. Of course, list-like objects don’t have custom indices like pandas objects, so it usually causes this error.
Solutions to common cases:
-
You want to assign the same values across multiple columns at once. In other words, you want to change the values of certain columns using a list-like object whose (a) length doesn’t match the number of columns or rows and (b) dtype doesn’t match the dtype of the columns they are being assigned to.1 An illustration may make it clearer. If you try to make the transformation below:
using a code similar to the one below, this error occurs:
df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]})
df.loc[:2, ['C','D']] = [100, 200.2, 300]
Solution: Duplicate the list/array/tuple, transpose it (either using T
or zip()
) and assign to the relevant rows/columns.2
df.loc[:2, ['C','D']] = np.tile([100, 200.2, 300], (len(['C','D']), 1)).T
# if you don't fancy numpy, use zip() on a list
# df.loc[:2, ['C','D']] = list(zip(*[[100, 200.2, 300]]*len(['C','D'])))
-
You want to assign the same values to multiple rows at once. If you try to make the following transformation
using a code similar to the following:
df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]})
df.loc[[0, 1], ['A', 'B', 'C']] = [100, 200.2]
Solution: To make it work as expected, we must convert the list/array into a Series with the correct index:
df.loc[[0, 1], ['A', 'B', 'C']] = pd.Series([100, 200.2], index=[0, 1])
A common sub-case is if the row indices come from using a boolean mask. N.B. This is the case in the OP. In that case, just use the mask to filter df.index
:
msk = df.index < 2
df.loc[msk, ['A', 'B', 'C']] = [100, 200.2] # <--- error
df.loc[msk, ['A', 'B', 'C']] = pd.Series([100, 200.2], index=df.index[msk]) # <--- OK
-
You want to store the same list in some rows of a column. An illustration of this case is:
Solution: Explicitly construct a Series with the correct indices.
# for the case on the left in the image above
df['D'] = pd.Series([[100, 200.2]]*len(df), index=df.index)
# latter case
df.loc[[1], 'D'] = pd.Series([[100, 200.2]], index=df.index[[1]])
1: Here, we tried to assign a list containing a float to int dtype columns, which contributed to this error being raised. If we tried to assign a list of ints (so that the dtypes match), we’d get a different error: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,3)
which can also be solved by the same method as above.
2: An error related to this one ValueError: Must have equal len keys and value when setting with an ndarray
occurs if the object being assigned is a numpy array and there’s a shape mismatch. That one is often solved either using np.tile
or simply transposing the array.
I have two dataframes as follows:
leader
:
0 11
1 8
2 5
3 9
4 8
5 6
[6065 rows x 2 columns]
DatasetLabel
:
0 1 .... 7 8 9 10 11 12
0 A J .... 1 2 5 NaN NaN NaN
1 B K .... 3 4 NaN NaN NaN NaN
[4095 rows x 14 columns]
The Information dataset column names 0 to 6 are DatasetLabel
about data and 7 to 12 are indexes that refer to the first column of leader
Dataframe.
I want to create dataset where instead of the indexes in DatasetLabel
dataframe, I have the value of each index from the leader
dataframe, which is leader.iloc[index,1]
.
How can I do it using python features?
The output should look like:
DatasetLabel
:
0 1 .... 7 8 9 10 11 12
0 A J .... 8 5 6 NaN NaN NaN
1 B K .... 9 8 NaN NaN NaN NaN
I have come up with the following, but I get an error:
for column in DatasetLabel.ix[:, 8:13]:
DatasetLabel[DatasetLabel[column].notnull()] = leader.iloc[DatasetLabel[DatasetLabel[column].notnull()][column].values, 1]
Error:
ValueError: Must have equal len keys and value when setting with an iterable
You can use apply
to index into leader
and exchange values with DatasetLabel
, although it’s not very pretty.
One issue is that Pandas won’t let us index with NaN
. Converting to str
provides a workaround. But that creates a second issue, namely, column 9
is of type float
(because NaN
is float
), so 5
becomes 5.0
. Once it’s a string, that’s "5.0"
, which will fail to match the index values in leader
. We can remove the .0
, and then this solution will work – but it’s a bit of a hack.
With DatasetLabel
as:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 1 2 5.0 NaN NaN NaN
1 1 B K 3 4 NaN NaN NaN NaN
And leader
as:
0 1
0 0 11
1 1 8
2 2 5
3 3 9
4 4 8
5 5 6
Then:
cols = ["7","8","9","10","11","12"]
updated = DatasetLabel[cols].apply(
lambda x: leader.loc[x.astype(str).str.split(".").str[0], 1].values, axis=1)
updated
7 8 9 10 11 12
0 8.0 5.0 6.0 NaN NaN NaN
1 9.0 8.0 NaN NaN NaN NaN
Now we can concat
the unmodified columns (which we’ll call original
) with updated
:
original_cols = DatasetLabel.columns[~DatasetLabel.columns.isin(cols)]
original = DatasetLabel[original_cols]
pd.concat([original, updated], axis=1)
Output:
Unnamed:0 0 1 7 8 9 10 11 12
0 0 A J 8.0 5.0 6.0 NaN NaN NaN
1 1 B K 9.0 8.0 NaN NaN NaN NaN
Note: It may be clearer to use concat
here, but here’s another, cleaner way of merging original
and updated
, using assign
:
DatasetLabel.assign(**updated)
The source code shows that this error occurs when you try to broadcast a list-like object (numpy array, list, set, tuple etc.) to multiple columns or rows but didn’t specify the index correctly. Of course, list-like objects don’t have custom indices like pandas objects, so it usually causes this error.
Solutions to common cases:
-
You want to assign the same values across multiple columns at once. In other words, you want to change the values of certain columns using a list-like object whose (a) length doesn’t match the number of columns or rows and (b) dtype doesn’t match the dtype of the columns they are being assigned to.1 An illustration may make it clearer. If you try to make the transformation below:
using a code similar to the one below, this error occurs:
df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]}) df.loc[:2, ['C','D']] = [100, 200.2, 300]
Solution: Duplicate the list/array/tuple, transpose it (either using
T
orzip()
) and assign to the relevant rows/columns.2df.loc[:2, ['C','D']] = np.tile([100, 200.2, 300], (len(['C','D']), 1)).T # if you don't fancy numpy, use zip() on a list # df.loc[:2, ['C','D']] = list(zip(*[[100, 200.2, 300]]*len(['C','D'])))
-
You want to assign the same values to multiple rows at once. If you try to make the following transformation
using a code similar to the following:
df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]}) df.loc[[0, 1], ['A', 'B', 'C']] = [100, 200.2]
Solution: To make it work as expected, we must convert the list/array into a Series with the correct index:
df.loc[[0, 1], ['A', 'B', 'C']] = pd.Series([100, 200.2], index=[0, 1])
A common sub-case is if the row indices come from using a boolean mask. N.B. This is the case in the OP. In that case, just use the mask to filter
df.index
:msk = df.index < 2 df.loc[msk, ['A', 'B', 'C']] = [100, 200.2] # <--- error df.loc[msk, ['A', 'B', 'C']] = pd.Series([100, 200.2], index=df.index[msk]) # <--- OK
-
You want to store the same list in some rows of a column. An illustration of this case is:
Solution: Explicitly construct a Series with the correct indices.
# for the case on the left in the image above df['D'] = pd.Series([[100, 200.2]]*len(df), index=df.index) # latter case df.loc[[1], 'D'] = pd.Series([[100, 200.2]], index=df.index[[1]])
1: Here, we tried to assign a list containing a float to int dtype columns, which contributed to this error being raised. If we tried to assign a list of ints (so that the dtypes match), we’d get a different error: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,3)
which can also be solved by the same method as above.
2: An error related to this one ValueError: Must have equal len keys and value when setting with an ndarray
occurs if the object being assigned is a numpy array and there’s a shape mismatch. That one is often solved either using np.tile
or simply transposing the array.