How to assign numeric labels to all elements in a list/series/array based on numbers from a different list?

Question:

I have two lists that contains two series of numbers, such as:

A = [1.0, 2.9, 3.4, 4.2, 5.5....100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

I would like to create another list of labels based on whether the elements in list B falls in an (any) interval from list A. Something like this:

C = [group_1, group_1, group_1, group_1, group_2, group_2, group_3]

i.e. 1.1, 1.2, 1.3, 2.5 all fall in the interval of 1.0 – 2.9 from list A, hence group_1; 3.0, 3.1 both fall in the interval of 2.9 – 3.4, hence group_2; and 5.2 falls in the interval of 4.2 – 5.5, hence group_3, etc..

It doesn’t matter which interval from list A does the number from list B fall in, the point is to group/label all elements in list B in a consecutive manner.

The orginal data is large so it would be impossible to manually assign labels/groups to elements in list B. Any help is appreciated.

Asked By: Jack Arkmount

||

Answers:

So, assuming A is sorted, you can use binary search, which already comes with the python standard library in the (rather clunky) module bisect:

>>> A = [1.0, 2.9, 3.4, 4.2, 5.5]
>>> B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]
>>> [bisect.bisect_left(A, b) for b in B]
[1, 1, 1, 1, 2, 2, 4]

This takes O(N * logN) time.

Note, take care to read the documentation and how bisect_left and bisect_right behave when a value in B is equal to a value in A, and how items that wouldn’t fall anywhere behave.

Answered By: juanpa.arrivillaga

You can try this for O(n) solution (assuming both lists are sorted and one number must be in one of the intervals in A):

A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

grp = 0
i1, i2 = iter(A), iter(B)
a, b = next(i1), next(i2)

out = []
while True:
    try:
        if a < b:
            a = next(i1)
            grp += 1
        else:
            out.append(grp)
            b = next(i2)
    except StopIteration:
        break

print(out)

Prints:

[1, 1, 1, 1, 2, 2, 4]
Answered By: Andrej Kesely

I think itertools.groupby with a tiny mutable "key function" would fit nicely (especially if requirements may change, or if you need to use this pattern elsewhere):

import itertools

class ThresholdIndexer:
    """Callable that returns the index of the last threshold <= arg.

    Preconditions:
      - thresholds is sorted and not empty.
      - For all calls, `thresholds[0] <= call[i].arg <= thresholds[-1]`.
      - For all calls, `call[i - 1].arg <= call[i].arg`.
    """

    def __init__(self, thresholds):
        self.thresholds = thresholds
        self.i = 0

    def __call__(self, arg):
        while not (self.thresholds[self.i] <= arg <= self.thresholds[self.i + 1]):
            self.i += 1
        return self.i 

A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

for group_key, group_items in itertools.groupby(B, key=ThresholdIndexer(A)):
    print(f'{group_key}: {", ".join(str(i) for i in group_items)}')

"""Output:
0: 1.1, 1.2, 1.3, 2.5
1: 3.0, 3.1
3: 5.2
"""

This approach is O(NA + NB).

You can remove these preconditions by binary-searching for the correct index in __call__, rather than assuming some latter index will "definitely" be correct. However, the complexity would bump up to O(NB × log NA).

Answered By: Brian Rodriguez

You can answer it in O(len(B)) according to this code:

C= [0]*len(B)
i, j = 0, 0

while i < len(B):
    if (B[i] > A[j] and B[i] < A[j+1]):
        C[i] = j
        i += 1
    else:
        j += 1
Answered By: Mr.Ziri

try this:

import numpy as np


A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]
A_arr = np.array(A)
B_arr = np.array(B)
C = [np.searchsorted(A_arr, b) for b in B_arr]
print(C)
>>>
[1, 1, 1, 1, 2, 2, 4]
Answered By: ziying35