How to maintain a list of top results from for loop in Python

Question:

I am iterating through a list of ~10,000 items. For every item, I process it and get a value. I would like to return a list of tuples with the names and the top 10 values, in descending order of the values.

It looks something like this:

top_tuples = []
for item in itemlist:
    cur_value = compute_value(item)
    my_tuple = (item, cur_value)
    if cur_value is > the smallest value on my list:
        remove smallest value from top_tuples
        add tuple to top_tuples at appropriate index #index is based on value

Thank you.

Answers:

EDIT

After receiving a comment on this old answer, I realise the solution I gave is not really good. A better way to do what OP requested is to use a heap with a fixed maximum size. The only requirement is that the item value is put first in the tuple, negated (since Python heaps are "min heaps").

import heapq

NUM_TOP = 10
top_tuples = []
for item in itemlist:
    cur_value = compute_value(item)
    my_tuple = (-cur_value, item)
    if len(top_tuples) < NUM_TOP:
        heapq.heappush(top_tuples, my_tuple)
    else:
        heapq.heappushpop(top_tuples, my_tuple)    

Note that even though the heap is just a list, it is not sequentially sorted by value. You can convert the heap into the sorted list of top elements, with the correct item value, like this:

top_tuples_sorted = []
while top_tuples:
    neg_item_value, item = heapq.heappop(top_tuples)
    top_tuples_sorted.append((item, -neg_item_value))

If you do not want to empty the heap in the process, you could either copy it first, or use sorted to get a new sequentially sorted list:

top_tuples_sorted = sorted((item, -neg_item_value for neg_item_value, item in top_tuples), key=lambda a: -a[1])

OLD ANSWER

Try this:

from operator import itemgetter

tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=itemgetter(1), reverse=True)[:10]
Answered By: jdehesa

without itemgetter:

tuples_gen = ((item, compute_value(item)) for item in itemlist)
top_tuples = sorted(tuples_gen, key=lambda tup_gen: tup_gen[1], reverse=True)[:10]
Answered By: mquantin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.