Counting key frequency and create list associated to that key

Question:

I’d like some help to count frequency of a key and also list some unique data associated with that key.

Imagine input file csv like this:

key1, owner1, owner2

key2, ownerA, ownerB
key2, ownerB, ownerB

key3, ownerJ, ownerK
key3, ownerJ, ownerK
key3, ownerL, ownerM

I’d like the output csv to be:

key   | Freq | List of owners with duplicates removed
key3,    3,    ownerJ, ownerK, ownerL, ownerM
key2,    2,    ownerA, ownerB
key1,    1,    owner1, owner2

I’ve written code to accomplish the frequency count.
But I don’t know how to create the list of unique owners?
Here is my code so far in python:

import csv
import collections

multiOwner = collections.Counter()

with open('input.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')

    for row in csv_reader:
        multiOwner[row[0]] += 1

    print ("n".join(str(element_and_count) 
           for element_and_count in multiOwner.most_common()))

How can I build the list of owners and keep it associated with the right key?

Asked By: Brajesh

||

Answers:

Use nested dictionaries, and a set for the owners to remove duplicates. You can use defaultdict() to initialize the data for each key.

import csv
import collections

multiOwner = collections.defaultdict(lambda: {'freq': 0, 'owners': set()}

with open('input.csv', newline="") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')

    for key, owner1, owner2, *_ in csv_reader:
        multiOwner[key]['freq'] += 1
        multiOwner[key]['owners'].add(owner1)
        multiOwner[key]['owners'].add(owner2)
Answered By: Barmar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.