Must have equal len keys and value when setting with an iterable

Question:

I have two dataframes as follows:

leader:

0 11
1  8
2  5
3  9
4  8
5  6
[6065 rows x 2 columns]

DatasetLabel:

   0  1 ....  7  8    9   10   11   12  
0  A  J ....  1  2    5  NaN  NaN  NaN  
1  B  K ....  3  4  NaN  NaN  NaN  NaN
[4095 rows x 14 columns]

The Information dataset column names 0 to 6 are DatasetLabel about data and 7 to 12 are indexes that refer to the first column of leader Dataframe.

I want to create dataset where instead of the indexes in DatasetLabel dataframe, I have the value of each index from the leader dataframe, which is leader.iloc[index,1].

How can I do it using python features?

The output should look like:

DatasetLabel:

   0  1 ....  7  8    9   10   11   12  
0  A  J ....  8  5    6  NaN  NaN  NaN  
1  B  K ....  9  8  NaN  NaN  NaN  NaN  

I have come up with the following, but I get an error:

for column in DatasetLabel.ix[:, 8:13]:
    DatasetLabel[DatasetLabel[column].notnull()] = leader.iloc[DatasetLabel[DatasetLabel[column].notnull()][column].values, 1]

Error:

ValueError: Must have equal len keys and value when setting with an iterable
Asked By: user3806649

||

Answers:

You can use apply to index into leader and exchange values with DatasetLabel, although it’s not very pretty.

One issue is that Pandas won’t let us index with NaN. Converting to str provides a workaround. But that creates a second issue, namely, column 9 is of type float (because NaN is float), so 5 becomes 5.0. Once it’s a string, that’s "5.0", which will fail to match the index values in leader. We can remove the .0, and then this solution will work – but it’s a bit of a hack.

With DatasetLabel as:

   Unnamed:0  0  1  7  8    9  10  11  12
0          0  A  J  1  2  5.0 NaN NaN NaN
1          1  B  K  3  4  NaN NaN NaN NaN

And leader as:

   0   1
0  0  11
1  1   8
2  2   5
3  3   9
4  4   8
5  5   6

Then:

cols = ["7","8","9","10","11","12"]
updated = DatasetLabel[cols].apply(
    lambda x: leader.loc[x.astype(str).str.split(".").str[0], 1].values, axis=1)

updated
     7    8    9  10  11  12
0  8.0  5.0  6.0 NaN NaN NaN
1  9.0  8.0  NaN NaN NaN NaN

Now we can concat the unmodified columns (which we’ll call original) with updated:

original_cols = DatasetLabel.columns[~DatasetLabel.columns.isin(cols)]
original = DatasetLabel[original_cols]
pd.concat([original, updated], axis=1)

Output:

   Unnamed:0  0  1    7    8    9  10  11  12
0          0  A  J  8.0  5.0  6.0 NaN NaN NaN
1          1  B  K  9.0  8.0  NaN NaN NaN NaN

Note: It may be clearer to use concat here, but here’s another, cleaner way of merging original and updated, using assign:

DatasetLabel.assign(**updated)
Answered By: andrew_reece

The source code shows that this error occurs when you try to broadcast a list-like object (numpy array, list, set, tuple etc.) to multiple columns or rows but didn’t specify the index correctly. Of course, list-like objects don’t have custom indices like pandas objects, so it usually causes this error.

Solutions to common cases:

  1. You want to assign the same values across multiple columns at once. In other words, you want to change the values of certain columns using a list-like object whose (a) length doesn’t match the number of columns or rows and (b) dtype doesn’t match the dtype of the columns they are being assigned to.1 An illustration may make it clearer. If you try to make the transformation below:

    first

    using a code similar to the one below, this error occurs:

    df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]})
    df.loc[:2, ['C','D']] = [100, 200.2, 300]
    

    Solution: Duplicate the list/array/tuple, transpose it (either using T or zip()) and assign to the relevant rows/columns.2

    df.loc[:2, ['C','D']] = np.tile([100, 200.2, 300], (len(['C','D']), 1)).T 
    # if you don't fancy numpy, use zip() on a list
    # df.loc[:2, ['C','D']] = list(zip(*[[100, 200.2, 300]]*len(['C','D'])))
    

  1. You want to assign the same values to multiple rows at once. If you try to make the following transformation

    second

    using a code similar to the following:

    df = pd.DataFrame({'A': [1, 5, 9], 'B': [2, 6, 10], 'C': [3, 7, 11], 'D': [4, 8, 12]})
    df.loc[[0, 1], ['A', 'B', 'C']] = [100, 200.2]
    

    Solution: To make it work as expected, we must convert the list/array into a Series with the correct index:

    df.loc[[0, 1], ['A', 'B', 'C']] = pd.Series([100, 200.2], index=[0, 1])
    

    A common sub-case is if the row indices come from using a boolean mask. N.B. This is the case in the OP. In that case, just use the mask to filter df.index:

    msk = df.index < 2
    df.loc[msk, ['A', 'B', 'C']] = [100, 200.2]                                 # <--- error
    df.loc[msk, ['A', 'B', 'C']] = pd.Series([100, 200.2], index=df.index[msk]) # <--- OK
    

  1. You want to store the same list in some rows of a column. An illustration of this case is:

    third

    Solution: Explicitly construct a Series with the correct indices.

    # for the case on the left in the image above
    df['D'] = pd.Series([[100, 200.2]]*len(df), index=df.index)
    
    # latter case
    df.loc[[1], 'D'] = pd.Series([[100, 200.2]], index=df.index[[1]])
    

1: Here, we tried to assign a list containing a float to int dtype columns, which contributed to this error being raised. If we tried to assign a list of ints (so that the dtypes match), we’d get a different error: ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,3) which can also be solved by the same method as above.

2: An error related to this one ValueError: Must have equal len keys and value when setting with an ndarray occurs if the object being assigned is a numpy array and there’s a shape mismatch. That one is often solved either using np.tile or simply transposing the array.

Answered By: cottontail