Assigning values with both boolean masking and indexing
Question:
Consider the following toy example:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
df.loc[df.Value % 2 == 0, 'Value'] = df_subset.Value * 10
df before assignment:
0. 0
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
df after assignment:
0. NaN
1. 1
2. NaN
3. 3
4. 40
5. 5
6. NaN
This happens due to the following reason:
- Only items for which the mask / boolean index is true are modified, i.e only even elements
- This is the reason why
idx=1
is not set to NaN
- Any indices which don’t appear in the index of the right hand side are set to NaN
What I want to achieve however is the same behaviour without setting missing index entries to NaN, i.e
- Modify elements for which the mask is true
- For those elements: Replace a value in
df
with that in df_subset
if the particular index is part of df.index
desired output:
0. 0
1. 1
2. 2
3. 3
4. 40
5. 5
6. 6
Answers:
First idea is chain both masks by &
for bitwise AND
, for test index is used Index.isin
:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
mask = (df.Value % 2 == 0) & (df.index.isin([3,4,5]))
df.loc[mask, 'Value'] = df_subset.Value * 10
print (df)
Value
0 0
1 1
2 2
3 3
4 40
5 5
6 6
Or:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
mask = (df.Value % 2 == 0) & (df.index.isin([3,4,5]))
df.loc[mask, 'Value'] *= 10
print (df)
Value
0 0
1 1
2 2
3 3
4 40
5 5
6 6
Another idea is filter subset by original mask and use DataFrame.update
:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
df.update(df_subset.loc[df.Value % 2 == 0, 'Value'] * 10)
#alternative
#df.update(df_subset.loc[df_subset.Value % 2 == 0, 'Value'] * 10)
print (df)
Value
0 0.0
1 1.0
2 2.0
3 3.0
4 40.0
5 5.0
6 6.0
Consider the following toy example:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
df.loc[df.Value % 2 == 0, 'Value'] = df_subset.Value * 10
df before assignment:
0. 0
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
df after assignment:
0. NaN
1. 1
2. NaN
3. 3
4. 40
5. 5
6. NaN
This happens due to the following reason:
- Only items for which the mask / boolean index is true are modified, i.e only even elements
- This is the reason why
idx=1
is not set to NaN - Any indices which don’t appear in the index of the right hand side are set to NaN
What I want to achieve however is the same behaviour without setting missing index entries to NaN, i.e
- Modify elements for which the mask is true
- For those elements: Replace a value in
df
with that indf_subset
if the particular index is part ofdf.index
desired output:
0. 0
1. 1
2. 2
3. 3
4. 40
5. 5
6. 6
First idea is chain both masks by &
for bitwise AND
, for test index is used Index.isin
:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
mask = (df.Value % 2 == 0) & (df.index.isin([3,4,5]))
df.loc[mask, 'Value'] = df_subset.Value * 10
print (df)
Value
0 0
1 1
2 2
3 3
4 40
5 5
6 6
Or:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
mask = (df.Value % 2 == 0) & (df.index.isin([3,4,5]))
df.loc[mask, 'Value'] *= 10
print (df)
Value
0 0
1 1
2 2
3 3
4 40
5 5
6 6
Another idea is filter subset by original mask and use DataFrame.update
:
df = pd.DataFrame([0,1,2,3,4,5,6], columns=['Value'])
df_subset = df.loc[[3,4,5]]
df.update(df_subset.loc[df.Value % 2 == 0, 'Value'] * 10)
#alternative
#df.update(df_subset.loc[df_subset.Value % 2 == 0, 'Value'] * 10)
print (df)
Value
0 0.0
1 1.0
2 2.0
3 3.0
4 40.0
5 5.0
6 6.0