Looping lists of lists to count the appearance of each pair of elements (in son lists)

Question:

Two lists of lists as below, and it needs to find out how many times (count) the pair of each elements in the son lists.

For example, William_Delta appear 4 times.

The result is to be written into a txt file.

processes = [['Iota', 'Gamma', 'Kappa'], ['Delta', 'Zeta', 'Beta'], ['Alpha', 'Zeta'], ['Alpha', 'Epsilon', 'Delta', 'Beta']]
staffs = [['William', 'James', 'Noah', 'Oliver'], ['Benjamin', 'Oliver', 'William'],['Oliver', 'Benjamin']]


list_output = []

for each_p in processes:
    for p in each_p:
        for each_s in staffs:
            for s in each_s:
                output = s + '_' + p
                list_output.append(output)

uniques = set(list_output)

with open('c:\temp\outfile.txt', 'a') as outfile:

  for ox in uniques:
    outfile.write(ox + '@' + str(list_output.count(ox)) + "n")

The lengths of both ‘processes’ and ‘staffs’ are very long so it takes much time to complete.

What’s the better way to make the run shorter?

Asked By: Mark K

||

Answers:

You can use collections.Counter and chain.from_iterable and product from itertools:

from collections import Counter
from itertools import product, chain

output = Counter(
    f"{s}_{p}" for p, s in 
    product(*map(chain.from_iterable, [processes, staffs]))
)

with open(file) as outfile:
    for name, count in output.items():
        outfile.write(f"{name}@{count}n")

A little more verbose version would be:

all_processes = chain.from_iterable(processes)
all_staffs = chain.from_iterable(staffs)

name_counts = Counter(f"{s}_{p}" for s, p in product(all_processes, all_staffs))

with open(file) as outfile:
    for name, count in name_counts.items():
        outfile.write(f"{name}@{count}n")
Answered By: Sayandip Dutta

Use collections.Counter to count each time an element appears in each sublist, then use itertools.product to find all the pairs. In the end the total count is the multiplication of each count.
For example, "William" appears 2 times and "Delta" appears 2 times, therefore total count of the pair "William_Delta" is 4 (2 * 2).

from collections import Counter
from itertools import product

processes = [['Iota', 'Gamma', 'Kappa'], ['Delta', 'Zeta', 'Beta'], ['Alpha', 'Zeta'], ['Alpha', 'Epsilon', 'Delta', 'Beta']]
staffs = [['William', 'James', 'Noah', 'Oliver'], ['Benjamin', 'Oliver', 'William'],['Oliver', 'Benjamin']]


count_staffs = Counter(st for staff in staffs for st in staff)
count_processes = Counter(pr for process in processes for pr in process)

with open('outfile.txt', 'a') as outfile:
    for (staff, cs), (process, cp) in product(count_staffs.items(), count_processes.items()):
        outfile.write(f"{staff}_{process}@{cs * cp}n")

This solution should be faster than finding all the pairs and counting them.

Answered By: Dani Mesejo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.