Python. List comprehensions with not in and or
Question:
I have a list of strings that I want to filter:
A = ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2', 'price', 'price_std']
If I need strings that contain EITHER ‘enc’ OR ‘lag’, I can do the following:
[_ for _ in A if ('enc' in _) or ('lag' in _)]
Output: ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2']
Everything is fine. However, if I need strings that contain NEITHER ‘enc’ NOR ‘lag’, a seemingly obvious solution doesn’t work:
[_ for _ in A if ('enc' not in _) or ('lag' not in _)]
Output: ['enc_1', 'enc_2', 'lag_1', 'lag_2', 'price', 'price_std']
Judging by the result, I would expect an expression with AND to produce such an output (‘enc_lag’ would be removed), but for whatever reason OR does it instead. I am starting deeply questioning my understanding of OR and AND operators… Any help is appreciated!
Answers:
What you actually want is and
here. If the element must contain neither 'enc'
nor 'lag'
, then it must not contain 'enc'
AND must not contain 'lag'
.
[_ for _ in A if ('enc' not in _) and ('lag' not in _)]
Alternatively, by applying De Morgan’s law, we have:
[_ for _ in A if not (('enc' in _) or ('lag' in _))]
You just need an AND instead. You’re checking if every value:
- contains enc OR
- contains lag
So, every value that has enc in it (true) doesn’t have lag (false). which means your condition comes back true.
From what I understand you want it to return [‘price’, ‘price_std’] in the not statement.
In your case take for example ‘enc_1’, (‘enc’ not in _) = False but then you say OR (‘log’ not in _) it would be True. If False or True = True. so enc_1 is True.
solution: [_ for _ in A if (‘enc’ not in _ and ‘lag’ not in _)]
I have a list of strings that I want to filter:
A = ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2', 'price', 'price_std']
If I need strings that contain EITHER ‘enc’ OR ‘lag’, I can do the following:
[_ for _ in A if ('enc' in _) or ('lag' in _)]
Output: ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2']
Everything is fine. However, if I need strings that contain NEITHER ‘enc’ NOR ‘lag’, a seemingly obvious solution doesn’t work:
[_ for _ in A if ('enc' not in _) or ('lag' not in _)]
Output: ['enc_1', 'enc_2', 'lag_1', 'lag_2', 'price', 'price_std']
Judging by the result, I would expect an expression with AND to produce such an output (‘enc_lag’ would be removed), but for whatever reason OR does it instead. I am starting deeply questioning my understanding of OR and AND operators… Any help is appreciated!
What you actually want is and
here. If the element must contain neither 'enc'
nor 'lag'
, then it must not contain 'enc'
AND must not contain 'lag'
.
[_ for _ in A if ('enc' not in _) and ('lag' not in _)]
Alternatively, by applying De Morgan’s law, we have:
[_ for _ in A if not (('enc' in _) or ('lag' in _))]
You just need an AND instead. You’re checking if every value:
- contains enc OR
- contains lag
So, every value that has enc in it (true) doesn’t have lag (false). which means your condition comes back true.
From what I understand you want it to return [‘price’, ‘price_std’] in the not statement.
In your case take for example ‘enc_1’, (‘enc’ not in _) = False but then you say OR (‘log’ not in _) it would be True. If False or True = True. so enc_1 is True.
solution: [_ for _ in A if (‘enc’ not in _ and ‘lag’ not in _)]