Remove the first N items that match a condition in a Python list
Question:
If I have a function matchCondition(x)
, how can I remove the first n
items in a Python list that match that condition?
One solution is to iterate over each item, mark it for deletion (e.g., by setting it to None
), and then filter the list with a comprehension. This requires iterating over the list twice and mutates the data. Is there a more idiomatic or efficient way to do this?
n = 3
def condition(x):
return x < 5
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = do_remove(data, n, condition)
print(out) # [10, 9, 8, 4, 7] (1, 2, and 3 are removed, 4 remains)
Answers:
Write a generator that takes the iterable, a condition, and an amount to drop. Iterate over the data and yield items that don’t meet the condition. If the condition is met, increment a counter and don’t yield the value. Always yield items once the counter reaches the amount you want to drop.
def iter_drop_n(data, condition, drop):
dropped = 0
for item in data:
if dropped >= drop:
yield item
continue
if condition(item):
dropped += 1
continue
yield item
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = list(iter_drop_n(data, lambda x: x < 5, 3))
This does not require an extra copy of the list, only iterates over the list once, and only calls the condition once for each item. Unless you actually want to see the whole list, leave off the list
call on the result and iterate over the returned generator directly.
If mutation is required:
def do_remove(ls, N, predicate):
i, delete_count, l = 0, 0, len(ls)
while i < l and delete_count < N:
if predicate(ls[i]):
ls.pop(i) # remove item at i
delete_count, l = delete_count + 1, l - 1
else:
i += 1
return ls # for convenience
assert(do_remove(l, N, matchCondition) == [10, 9, 8, 4, 7])
One way using itertools.filterfalse
and itertools.count
:
from itertools import count, filterfalse
data = [1, 10, 2, 9, 3, 8, 4, 7]
output = filterfalse(lambda L, c=count(): L < 5 and next(c) < 3, data)
Then list(output)
, gives you:
[10, 9, 8, 4, 7]
The accepted answer was a little too magical for my liking. Here’s one where the flow is hopefully a bit clearer to follow:
def matchCondition(x):
return x < 5
def my_gen(L, drop_condition, max_drops=3):
count = 0
iterator = iter(L)
for element in iterator:
if drop_condition(element):
count += 1
if count >= max_drops:
break
else:
yield element
yield from iterator
example = [1, 10, 2, 9, 3, 8, 4, 7]
print(list(my_gen(example, drop_condition=matchCondition)))
It’s similar to logic in davidism answer, but instead of checking the drop count is exceeded on every step, we just short-circuit the rest of the loop.
Note: If you don’t have yield from
available, just replace it with another for loop over the remaining items in iterator
.
Straightforward Python:
N = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
def matchCondition(x):
return x < 5
c = 1
l = []
for x in data:
if c > N or not matchCondition(x):
l.append(x)
else:
c += 1
print(l)
This can easily be turned into a generator if desired:
def filter_first(n, func, iterable):
c = 1
for x in iterable:
if c > n or not func(x):
yield x
else:
c += 1
print(list(filter_first(N, matchCondition, data)))
Using list comprehensions:
n = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
count = 0
def counter(x):
global count
count += 1
return x
def condition(x):
return x < 5
filtered = [counter(x) for x in data if count < n and condition(x)]
This will also stop checking the condition after n elements are found thanks to boolean short-circuiting.
Starting Python 3.8
, and the introduction of assignment expressions (PEP 572) (:=
operator), we can use and increment a variable within a list comprehension:
# items = [1, 10, 2, 9, 3, 8, 4, 7]
total = 0
[x for x in items if not (x < 5 and (total := total + 1) <= 3)]
# [10, 9, 8, 4, 7]
This:
- Initializes a variable
total
to 0
which will symbolize the number of previously matched occurrences within the list comprehension
- Checks for each item if it both:
- matches the exclusion condition (
x < 5
)
- and if we’ve not already discarded more than the number of items we wanted to filter out by:
- incrementing
total
(total := total + 1
) via an assignment expression
- and at the same time comparing the new value of
total
to the max number of items to discard (3
)
To remove the first n items in a Python list that match a condition, use itertools.dropwhile()
:
result = list(itertools.dropwhile(lambda x: x < 5, data))
If I have a function matchCondition(x)
, how can I remove the first n
items in a Python list that match that condition?
One solution is to iterate over each item, mark it for deletion (e.g., by setting it to None
), and then filter the list with a comprehension. This requires iterating over the list twice and mutates the data. Is there a more idiomatic or efficient way to do this?
n = 3
def condition(x):
return x < 5
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = do_remove(data, n, condition)
print(out) # [10, 9, 8, 4, 7] (1, 2, and 3 are removed, 4 remains)
Write a generator that takes the iterable, a condition, and an amount to drop. Iterate over the data and yield items that don’t meet the condition. If the condition is met, increment a counter and don’t yield the value. Always yield items once the counter reaches the amount you want to drop.
def iter_drop_n(data, condition, drop):
dropped = 0
for item in data:
if dropped >= drop:
yield item
continue
if condition(item):
dropped += 1
continue
yield item
data = [1, 10, 2, 9, 3, 8, 4, 7]
out = list(iter_drop_n(data, lambda x: x < 5, 3))
This does not require an extra copy of the list, only iterates over the list once, and only calls the condition once for each item. Unless you actually want to see the whole list, leave off the list
call on the result and iterate over the returned generator directly.
If mutation is required:
def do_remove(ls, N, predicate):
i, delete_count, l = 0, 0, len(ls)
while i < l and delete_count < N:
if predicate(ls[i]):
ls.pop(i) # remove item at i
delete_count, l = delete_count + 1, l - 1
else:
i += 1
return ls # for convenience
assert(do_remove(l, N, matchCondition) == [10, 9, 8, 4, 7])
One way using itertools.filterfalse
and itertools.count
:
from itertools import count, filterfalse
data = [1, 10, 2, 9, 3, 8, 4, 7]
output = filterfalse(lambda L, c=count(): L < 5 and next(c) < 3, data)
Then list(output)
, gives you:
[10, 9, 8, 4, 7]
The accepted answer was a little too magical for my liking. Here’s one where the flow is hopefully a bit clearer to follow:
def matchCondition(x):
return x < 5
def my_gen(L, drop_condition, max_drops=3):
count = 0
iterator = iter(L)
for element in iterator:
if drop_condition(element):
count += 1
if count >= max_drops:
break
else:
yield element
yield from iterator
example = [1, 10, 2, 9, 3, 8, 4, 7]
print(list(my_gen(example, drop_condition=matchCondition)))
It’s similar to logic in davidism answer, but instead of checking the drop count is exceeded on every step, we just short-circuit the rest of the loop.
Note: If you don’t have yield from
available, just replace it with another for loop over the remaining items in iterator
.
Straightforward Python:
N = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
def matchCondition(x):
return x < 5
c = 1
l = []
for x in data:
if c > N or not matchCondition(x):
l.append(x)
else:
c += 1
print(l)
This can easily be turned into a generator if desired:
def filter_first(n, func, iterable):
c = 1
for x in iterable:
if c > n or not func(x):
yield x
else:
c += 1
print(list(filter_first(N, matchCondition, data)))
Using list comprehensions:
n = 3
data = [1, 10, 2, 9, 3, 8, 4, 7]
count = 0
def counter(x):
global count
count += 1
return x
def condition(x):
return x < 5
filtered = [counter(x) for x in data if count < n and condition(x)]
This will also stop checking the condition after n elements are found thanks to boolean short-circuiting.
Starting Python 3.8
, and the introduction of assignment expressions (PEP 572) (:=
operator), we can use and increment a variable within a list comprehension:
# items = [1, 10, 2, 9, 3, 8, 4, 7]
total = 0
[x for x in items if not (x < 5 and (total := total + 1) <= 3)]
# [10, 9, 8, 4, 7]
This:
- Initializes a variable
total
to0
which will symbolize the number of previously matched occurrences within the list comprehension - Checks for each item if it both:
- matches the exclusion condition (
x < 5
) - and if we’ve not already discarded more than the number of items we wanted to filter out by:
- incrementing
total
(total := total + 1
) via an assignment expression - and at the same time comparing the new value of
total
to the max number of items to discard (3
)
- incrementing
- matches the exclusion condition (
To remove the first n items in a Python list that match a condition, use itertools.dropwhile()
:
result = list(itertools.dropwhile(lambda x: x < 5, data))