Insert items into a list based on their occurrence

Question:

Say I am continuously generating new data (e.g. integers) and want to collect them in a list.

import random

lst = []
for _ in range(50):
    num = random.randint(0, 10)
    lst.append(num)

When a new value is generated, I want it to be positioned in the list based on the count of occurrences of that value, so data with lower "current occurrence" should be placed before those with higher "current occurrence".

"Current occurrence" means "the number of duplicates of that data that have already been collected so far, up to this iteration". For the data that have the same occurrence, they should then follow the order in which they are generated.

For example, if at iteration 10 the current list is [1,2,3,4,2,3,4,3,4], let’s say a new value 1 is generated, then it should be inserted at index 7, resulting in [1,2,3,4,2,3,4,1,3,4]. Because it is the second occurrence of 1, it should be placed after all the values that only occur once, but after all other existing items that occur twice: 2, 3 and 4 (hence, preserving the order).


This is my current code that can rearrange the list:

from collections import defaultdict

def rearrange(lst):
    d = defaultdict(list)
    count = defaultdict(int)
    for x in lst:
        count[x] += 1
        d[count[x]].append(x)
    res = []
    for k in sorted(d.keys()):
        res += d[k]
    return res

lst = rearrange(lst)

However, this is not giving my expected result.

I wrote a separate algorithm that keeps generating new data until some convergence criterion is met, where the list has the potential to become extremely large.

Therefore I want to rearrange my generated values on-the-fly, i.e. to constantly insert data into the list "in-place". Of course I can use my rearrage function in each iteration, but that would be super inefficient. What I want to do is to insert new data into the correct position of the list, not replacing it with a new list in each iteration.

Any suggestions?

Edit: the data structure doesn’t necessarily need to be a list, but it has to be ordered, and doesn’t require another data structure to hold information.

Asked By: Shaun Han

||

Answers:

The data structure I think that might work better for your purpose is a forest (in this case, a disjoint union of lists).

In summary, you keep one internal list for each occurrence of the values. When a new value comes, you add it to the list just after the one you added the last value this item came.

In order to keep track of the counts of occurrences, you can use a built-in Counter.

Here is a sample implementation:

from collections import Counter

def rearranged(iterable):
  forest, counter = list(), Counter()
  for x in iterable:
    c = counter[x]
    if c == len(forest):
      forest.append([x])
    else:
      forest[c] += [x]
    counter[x] += 1
  return [x for lst in forest for x in lst]

rearranged([1,2,3,4,2,3,4,3,4,1])
# [1, 2, 3, 4, 2, 3, 4, 1, 3, 4]

For this to work better, your input iterable should be a generator (so the items can be generated on the fly).

Answered By: Rodrigo Rodrigues