Access pandas masks in a dictionary
Question:
I have a dictionary containing several pandas masks as strings for a specific dataframe, but I can’t find a way to use those masks.
Here is a short reproducible example :
df = pd.DataFrame({'age' : [10, 24, 35, 67], 'strength' : [0 , 3, 9, 4]})
masks = {'old_strong' : "(df['age'] >18) & (df['strength'] >5)",
'young_weak' : "(df['age'] <18) & (df['strength'] <5)"}
And I would like to do something like :
df[masks['young_weak']]
But since the mask is a string I get the error
KeyError: "(df['age'] <18) & (df['strength] <5)"
Answers:
Use DataFrame.query
with changed dictionary:
masks = {'old_strong' : "(age >18) & (strength >5)",
'young_weak' : "(age <18) & (strength <5)"}
print (df.query(masks['young_weak']))
age strength
0 10 0
Another way is to set up the masks as functions (lambda
expressions) instead of strings. This works:
masks = {'old_strong' : lambda row: (row['age'] >18) & (row['strength'] >5),
'young_weak' : lambda row: (row['age'] <18) & (row['strength'] <5)}
df[masks['young_weak']]
Unsafe solution though, and very bad practice, but the only way to solve it is to use eval
:
print(df[eval(masks['young_weak'])])
Output:
age strength
0 10 0
Here is the link to the reason it’s bad.
If you’re allowed to change the masks dictionary, the easiest way is to store filters and not strings like this:
masks = {
'old_strong' : (df['age'] >18) & (df['strength'] >5),
'young_weak' : (df['age'] <18) & (df['strength'] <5)
}
Otherwise, keep the strings and use df.query(masks['yound_weak'])
.
I have a dictionary containing several pandas masks as strings for a specific dataframe, but I can’t find a way to use those masks.
Here is a short reproducible example :
df = pd.DataFrame({'age' : [10, 24, 35, 67], 'strength' : [0 , 3, 9, 4]})
masks = {'old_strong' : "(df['age'] >18) & (df['strength'] >5)",
'young_weak' : "(df['age'] <18) & (df['strength'] <5)"}
And I would like to do something like :
df[masks['young_weak']]
But since the mask is a string I get the error
KeyError: "(df['age'] <18) & (df['strength] <5)"
Use DataFrame.query
with changed dictionary:
masks = {'old_strong' : "(age >18) & (strength >5)",
'young_weak' : "(age <18) & (strength <5)"}
print (df.query(masks['young_weak']))
age strength
0 10 0
Another way is to set up the masks as functions (lambda
expressions) instead of strings. This works:
masks = {'old_strong' : lambda row: (row['age'] >18) & (row['strength'] >5),
'young_weak' : lambda row: (row['age'] <18) & (row['strength'] <5)}
df[masks['young_weak']]
Unsafe solution though, and very bad practice, but the only way to solve it is to use eval
:
print(df[eval(masks['young_weak'])])
Output:
age strength
0 10 0
Here is the link to the reason it’s bad.
If you’re allowed to change the masks dictionary, the easiest way is to store filters and not strings like this:
masks = {
'old_strong' : (df['age'] >18) & (df['strength'] >5),
'young_weak' : (df['age'] <18) & (df['strength'] <5)
}
Otherwise, keep the strings and use df.query(masks['yound_weak'])
.