counting the number of consecutive ocurrences and it's respective quantities on a list (Python)

Question:

as I said in the title I want to calculate the consecutive ocurrences and it’s respective quantities on a list.

For example:

['a','b','a','a'] should output [('a',1),('b',1),('a',2)] (or an equivalent format)

['a','a','b','b','b','d'] should output [('a', 2), ('b', 1),('d',1)]

I need this because I’m counting the number of consecutive ocurrences of a timeseries on a specific column but this problem is equivalent.

This is what I did:

list_to_summarize = [‘a’,’a’,’b’,’b’,’b’,’d’]

def summary_of_list(list_to_summarize):
    list_values = []
    list_quantities = []
    
    c = 0
    for index,value in enumerate(list_to_summarize):
        # base case
        if not list_values:
            list_values.append(value)
            c += 1
            continue

        # middle cases
        if index < len(list_to_summarize)-1:

            #if the last value is the same as the current value we add one to the counter
            if (list_values[-1] == value):
                c += 1

            #if the last value is different from the current value we add the last value to the list and reset the counter
            elif list_values[-1] != value:
                list_values.append(value)
                list_quantities.append(c)
                c = 0

        # Final Cases
        # if the value is the same as the last one but it is the last one we add one to the counter and we add the value and the counter to the lists
        if (index == len(list_to_summarize)-1):
            if list_values[-1] == value:
                c += 1
                list_quantities.append(c)
                list_values.append(value)
            else:
                list_quantities.append(1)
                list_values.append(value)
    return list(zip(list_values, list_quantities))

I’m close enough because on this example:

list_to_summarize = ['a','a','b','b','b','d']
summary_of_list(list_to_summarize)

outputs

[('a', 2), ('b', 1)]

Despite of the fact that this solution can be completed. I’m pretty sure that this can be done in a more Pythonic manner. Thanks in advance

Asked By: Román

||

Answers:

You can use itertools.groupby:

from itertools import groupby

def summary_of_list(lst):
    return [(k, sum(1 for _ in g)) for k, g in groupby(lst)]

print(summary_of_list(['a','b','a','a'])) # [('a', 1), ('b', 1), ('a', 2)]
print(summary_of_list(['a','a','b','b','b','d'])) # [('a', 2), ('b', 3), ('d', 1)]

(I believe your expected output [('a',2), ('b',1), ('d',1)] for ['a','a','b','b','b','d'] had a typo.)

Answered By: j1-lee
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.