Initialize all-true boolean index for Pandas
Question:
I find myself sometimes building a boolean/mask iteratively, so something like:
mask = initialize_mask_to_true()
for condition in conditions:
mask = mask & condition
df_masked = pd.loc[mask, my_cols]
Where conditions might be a list of separate boolean masks or comparisons like df[some_col] > someVal
Is there a good way to do the initialize_mask_to_true()? Sometimes I’ll do something that feels ugly like:
mask = ~(df.loc[:, df.columns[0]] == np.nan)
which works because something == np.nan
will always be false, but it feels like there’s a cleaner way.
Answers:
I use numpy.ones for that:
np.ones(df.shape[0], dtype=bool)
If the index must be preserved:
mask= pd.DataFrame(True,index=df.index,columns=df.columns)
or
mask= pd.DataFrame(True,index=df.index,columns=[df.columns[0]])
I find myself sometimes building a boolean/mask iteratively, so something like:
mask = initialize_mask_to_true()
for condition in conditions:
mask = mask & condition
df_masked = pd.loc[mask, my_cols]
Where conditions might be a list of separate boolean masks or comparisons like df[some_col] > someVal
Is there a good way to do the initialize_mask_to_true()? Sometimes I’ll do something that feels ugly like:
mask = ~(df.loc[:, df.columns[0]] == np.nan)
which works because something == np.nan
will always be false, but it feels like there’s a cleaner way.
I use numpy.ones for that:
np.ones(df.shape[0], dtype=bool)
If the index must be preserved:
mask= pd.DataFrame(True,index=df.index,columns=df.columns)
or
mask= pd.DataFrame(True,index=df.index,columns=[df.columns[0]])