Boolean Series key will be reindexed to match DataFrame index

Question:

Here is how I encountered the warning:

df.loc[a_list][df.a_col.isnull()]

The type of a_list is Int64Index, it contains a list of row indexes. All of these row indexes belong to df.

The df.a_col.isnull() part is a condition I need for filtering.

If I execute the following commands individually, I do not get any warnings:

df.loc[a_list]
df[df.a_col.isnull()]

But if I put them together df.loc[a_list][df.a_col.isnull()], I get the warning message (but I can see the result):

Boolean Series key will be reindexed to match DataFrame index

What is the meaning of this warning message? Does it affect the result that it returned?

Asked By: Cheng

||

Answers:

Your approach will work despite the warning, but it’s best not to rely on implicit, unclear behavior.

Solution 1, make the selection of indices in a_list a boolean mask:

df[df.index.isin(a_list) & df.a_col.isnull()]

Solution 2, do it in two steps:

df2 = df.loc[a_list]
df2[df2.a_col.isnull()]

Solution 3, if you want a one-liner, use a trick found here:

df.loc[a_list].query('a_col != a_col')

The warning comes from the fact that the boolean vector df.a_col.isnull() is the length of df, while df.loc[a_list] is of the length of a_list, i.e. shorter. Therefore, some indices in df.a_col.isnull() are not in df.loc[a_list].

What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull() the values corresponding to the indices in a_list. This works, but the behavior is implicit, and could easily change in the future, so that’s what the warning is about.

Answered By: IanS

If you got this warning, using .loc[] instead of [] suppresses this warning.1

df.loc[boolean_mask]           # <--------- OK
df[boolean_mask]               # <--------- warning

For the particular case in the OP, you can chain .loc[] indexers:

df.loc[a_list].loc[df['a_col'].isna()]

or chain all conditions using and inside query():

# if a_list is a list of indices of df
df.query("index in @a_list and a_col != a_col")

# if a_list is a list of values in some other column such as b_col
df.query("b_col in @a_list and a_col != a_col")

or chain all conditions using & inside [] (as in @IanS’s post).


This warning occurs if

  • the index of the boolean mask is not in the same order as the index of the dataframe it is filtering.

    df = pd.DataFrame({'a_col':[1, 2, np.nan]}, index=[0, 1, 2])
    m1 = pd.Series([True, False, True], index=[2, 1, 0])
    df.loc[m1]       # <--------- OK
    df[m1]           # <--------- warning
    
  • the index of a boolean mask is a super set of the index of the dataframe it is filtering. For example:

    m2 = pd.Series([True, False, True, True], np.r_[df.index, 10])
    df.loc[m2]       # <--------- OK
    df[m2]           # <--------- warning
    

1: If we look at the source codes of [] and loc[], literally the only difference when the index of the boolean mask is a (weak) super set of the index of the dataframe is that [] shows this warning (via _getitem_bool_array method) and loc[] does not.

Answered By: cottontail

Coming across this page, I received the same error by querying the full dataframe, but using the results against sub data.

Create a subset of data and store in variable sub_df:

sub_df = df[df['a'] == 1]
sub_df = sub_df[df['b'] == 1] # Note "df" hiding here

Solution:

Be sure to use the same dataframe each time (in my case, only sub_df):

# Last line should instead be:
sub_df = sub_df[sub_df['b'] == 1]

Answered By: KJ Price

First of all, a warning is not an error. Pandas is simply informing you that it needs to reindex the boolean mask to perform this task, ensuring that the mask has an identical index as the target frame. I don’t know why this warning will not be shown if you use loc or query, but I think Pandas uses reindexing in those cases as well. To avoid this warning, you can manually reindex your boolean mask (which Pandas does automatically for you) before applying it:

df[boolean_mask.reindex_like(df)]

df[boolen_mask.reindex(df.index)]
Answered By: Mykola Zotko
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.