How to maintain a list of top results from for loop in Python
Question:
I am iterating through a list of ~10,000 items. For every item, I process it and get a value. I would like to return a list of tuples with the names and the top 10 values, in descending order of the values.
It looks something like this:
top_tuples = []
for item in itemlist:
cur_value = compute_value(item)
my_tuple = (item, cur_value)
if cur_value is > the smallest value on my list:
remove smallest value from top_tuples
add tuple to top_tuples at appropriate index #index is based on value
Thank you.
Answers:
EDIT
After receiving a comment on this old answer, I realise the solution I gave is not really good. A better way to do what OP requested is to use a heap with a fixed maximum size. The only requirement is that the item value is put first in the tuple, negated (since Python heaps are "min heaps").
import heapq
NUM_TOP = 10
top_tuples = []
for item in itemlist:
cur_value = compute_value(item)
my_tuple = (-cur_value, item)
if len(top_tuples) < NUM_TOP:
heapq.heappush(top_tuples, my_tuple)
else:
heapq.heappushpop(top_tuples, my_tuple)
Note that even though the heap is just a list, it is not sequentially sorted by value. You can convert the heap into the sorted list of top elements, with the correct item value, like this:
top_tuples_sorted = []
while top_tuples:
neg_item_value, item = heapq.heappop(top_tuples)
top_tuples_sorted.append((item, -neg_item_value))
If you do not want to empty the heap in the process, you could either copy it first, or use sorted
to get a new sequentially sorted list:
top_tuples_sorted = sorted((item, -neg_item_value for neg_item_value, item in top_tuples), key=lambda a: -a[1])
OLD ANSWER
Try this:
from operator import itemgetter
tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=itemgetter(1), reverse=True)[:10]
without itemgetter:
tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=lambda tup_gen: tup_gen[1], reverse=True)[:10]
I am iterating through a list of ~10,000 items. For every item, I process it and get a value. I would like to return a list of tuples with the names and the top 10 values, in descending order of the values.
It looks something like this:
top_tuples = []
for item in itemlist:
cur_value = compute_value(item)
my_tuple = (item, cur_value)
if cur_value is > the smallest value on my list:
remove smallest value from top_tuples
add tuple to top_tuples at appropriate index #index is based on value
Thank you.
EDIT
After receiving a comment on this old answer, I realise the solution I gave is not really good. A better way to do what OP requested is to use a heap with a fixed maximum size. The only requirement is that the item value is put first in the tuple, negated (since Python heaps are "min heaps").
import heapq
NUM_TOP = 10
top_tuples = []
for item in itemlist:
cur_value = compute_value(item)
my_tuple = (-cur_value, item)
if len(top_tuples) < NUM_TOP:
heapq.heappush(top_tuples, my_tuple)
else:
heapq.heappushpop(top_tuples, my_tuple)
Note that even though the heap is just a list, it is not sequentially sorted by value. You can convert the heap into the sorted list of top elements, with the correct item value, like this:
top_tuples_sorted = []
while top_tuples:
neg_item_value, item = heapq.heappop(top_tuples)
top_tuples_sorted.append((item, -neg_item_value))
If you do not want to empty the heap in the process, you could either copy it first, or use sorted
to get a new sequentially sorted list:
top_tuples_sorted = sorted((item, -neg_item_value for neg_item_value, item in top_tuples), key=lambda a: -a[1])
OLD ANSWER
Try this:
from operator import itemgetter
tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=itemgetter(1), reverse=True)[:10]
without itemgetter:
tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=lambda tup_gen: tup_gen[1], reverse=True)[:10]