Counting elements with tolerance

Question:

I have a long list of values (here below a shortened version) that I need to count:

ed = [ 0.52309  ,  3.1443  , 16.5789  , 24.0643  ,  9.70981 ,  1.71983 ,
       16.3453  , 14.1901  , 22.0353  ,  1.71983 , 15.0469  , 13.98    ,
       11.4753  , 32.7859  ,  9.7098  ,  6.36272 ,  3.2058  ,  1.46917 ,
        6.36271 , 11.5869  ,  1.72052 ,  6.32043 ,  1.72052 ,  1.72052 ,
        5.37679 ,  3.15279 ,  9.70979 ,  1.72052 ,  3.44035 ,  2.15729 ,
       12.0049  ]

and that I count with:

cnt = Counter(ed)
edlist = [list(i) for i in cnt.items()]

the list I obtain has some very similar values among the others

[[1.72052, 60], [1.71983, 34], [6.36271, 16], [9.7098, 14],[9.70979, 5], [0.52309, 3], [9.70981, 3]]

that I would like to add together within a given tolerance. For example

9.7098 has 16 counts
9.70981 has 3 counts
9.70979 has 5 counts

I would like to add all of them together to the item with the highest counts, and I am not sure if there is a function for that that allows to do that within some absolute or relative error. What I would like to obtain is

[[1.72052, 60], [1.71983, 34], [6.36271, 16], [9.7098, 22], [0.52309, 3]]

I have read the questions about grouping and clustering, but I do not know how to apply them. I need to count them with some given tolerance while keeping track of how many times each one has been found.

Asked By: saimon

||

Answers:

You can cluster the counts according to their key, as described here using groupby. To do that you will have to sort the list first.

Then, sum the counts of each group and add it to the final list:

from itertools import groupby

l = [[1.72052, 60], [1.71983, 34], [6.36271, 16], [9.7098, 14], [9.70979, 5], [0.52309, 3], [9.70981, 3]]
l.sort(key=lambda x: x[0])

tolerance = 0.001

res = []
for key, group in groupby(l, lambda x: int(x[0]*(1/tolerance))):
    # for example: key = 9709, group = [[9.70979, 5], [9.7098, 14], [9.70981, 3]]
    group = list(group)
    res.append([max(group, key=lambda x: x[1])[0], sum(x[1] for x in group)])

print(res)

It is mostly playing around with lambdas using the key or the count as the key to the different functions.


Alternatively, you could cluster the data itself (not the counts) and the count is the size of each group:

from itertools import groupby

l = [0.52309, 3.1443, 16.5789, 24.0643, 9.70981, ...]
l.sort()

tolerance = 0.001

res = []
for key, group in groupby(l, lambda x: int(x*(1/tolerance))):
    res.append([key*tolerance, len(list(group))])

print(res)

In this case as we can’t know the number with the most counts, the key is simply the normalized number according to the tolerance.

Answered By: Tomerikoo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.