Taking equal number of elements from two arrays, such that the taken values have as few duplicates as possible

Question:

Consider we have 2 arrays of size N, with their values in the range [0, N-1]. For example:

a = np.array([0, 1, 2, 0])
b = np.array([2, 0, 3, 3])

I need to produce a new array c which contains exactly N/2 elements from a and b respectively, i.e. the values must be taken evenly/equally from both parent arrays.

(For odd length, this would be (N-1)/2 and (N+1)/2. Can also ignore odd length case, not important).

Taking equal number of elements from two arrays is pretty trivial, but there is an additional constraint: c should have as many unique numbers as possible / as few duplicates as possible.

For example, a solution to a and b above is:

c = np.array([b[0], a[1], b[2], a[3]])
>>> c
array([2, 1, 3, 0])

Note that the position/order is preserved. Each element of a and b that we took to form c is in same position. If element i in c is from a, c[i] == a[i], same for b.


A straightforward solution for this is simply a sort of path traversal, easy enough to implement recursively:

def traverse(i, a, b, path, n_a, n_b, best, best_path):
    if n_a == 0 and n_b == 0:
        score = len(set(path))
        return (score, path.copy()) if score > best else (best, best_path)

    if n_a > 0:
        path.append(a[i])
        best, best_path = traverse(i + 1, a, b, path, n_a - 1, n_b, best, best_path)
        path.pop()
    
    if n_b > 0:
        path.append(b[i])
        best, best_path = traverse(i + 1, a, b, path, n_a, n_b - 1, best, best_path)
        path.pop()

    return best, best_path

Here n_a and n_b are how many values we will take from a and b respectively, it’s 2 and 2 as we want to evenly take 4 items.

>>> score, best_path = traverse(0, a, b, [], 2, 2, 0, None)
>>> score, best_path
(4, [2, 1, 3, 0])

Is there a way to implement the above in a more vectorized/efficient manner, possibly through numpy?

Asked By: Mercury

||

Answers:

Test this heavily.

import numpy as np
from numpy.random._generator import default_rng


rand = default_rng(seed=1)
n = 16
a = rand.integers(low=0, high=n, size=n)
b = rand.integers(low=0, high=n, size=n)
uniques = np.setxor1d(a, b)
print(a)
print(b)
print(uniques)


def limited_uniques(arr: np.ndarray) -> np.ndarray:
    choose = np.zeros(shape=n, dtype=bool)
    _, idx, _ = np.intersect1d(arr, uniques, return_indices=True)
    idx = idx[:n//2]
    choose[idx] = True
    n_missing = n//2 - len(idx)
    counts = choose.cumsum()
    diffs = np.arange(n) - counts
    at = np.searchsorted(diffs, n_missing)
    choose[:at] = True
    return arr[choose]


a_half = limited_uniques(a)
uniques = np.union1d(uniques, np.setdiff1d(a, a_half))

interleaved = np.empty_like(a)
interleaved[0::2] = a_half
interleaved[1::2] = limited_uniques(b)
print(interleaved)
[ 7  8 12 15  0  2 13 15  3  4 13  6  4 13  4  6]
[10  8  1  0 13 12 13  8 13  5  7 12  1  4  1  7]
[ 1  2  3  5  6 10 15]
[ 7 10  8  8 12  1 15  0  0 13  2 12  3  5  6  4]
Answered By: Reinderien

The algorithm is slow mainly because it runs in an exponential time. There is no straightforward way to vectorize this algorithm using only Numpy because of the recursion. Even if it would be possible, the huge number of combinations would cause most Numpy implementations to be inefficient (due to large Numpy arrays to compute). Additionally, there is AFAIK no vectorized operation to count the number of unique values of many rows efficiently (the usual way is to use np.unique which is not efficient in this case and cannot be use without a loop). As a result, there is two possible strategy to speed this up:

  • trying to find an algorithm with a reasonable complexity (eg. <= O(n^4));
  • using compilation methods, micro-optimizations and tricks to write a faster brute-force implementation.

Since finding a correct sub-exponential algorithm turns out not to be easy, I choose the other approach (though the first approach is the best).

The idea is to:

  • remove the recursion by generating all possible solutions using a loop iterating on integer;
  • write a fast way to count unique items of an array;
  • use the Numba JIT compiler so to optimize the code that is only efficient once compiled.

Here is the final code:

import numpy as np
import numba as nb

# Naive way to count unique items.
# This is a slow fallback implementation.
@nb.njit
def naive_count_unique(arr):
    count = 0
    for i in range(len(arr)):
        val = arr[i]
        found = False
        for j in range(i):
            if arr[j] == val:
                found = True
                break
        if not found:
            count += 1
    return count

# Optimized way to count unique items on small arrays.
# Count items 2 by 2.
# Fast on small arrays.
@nb.njit
def optim_count_unique(arr):
    count = 0
    for i in range(0, len(arr), 2):
        if arr[i] == arr[i+1]:
            tmp = 1
            for j in range(i):
                if arr[j] == arr[i]: tmp = 0
            count += tmp
        else:
            val1, val2 = arr[i], arr[i+1]
            tmp1, tmp2 = 1, 1
            for j in range(i):
                val = arr[j]
                if val == val1: tmp1 = 0
                if val == val2: tmp2 = 0
            count += tmp1 + tmp2
    return count

@nb.njit
def count_unique(arr):
    if len(arr) % 2 == 0:
        return optim_count_unique(arr)
    else:
        # Odd case: not optimized yet
        return naive_count_unique(arr)

# Count the number of bits in a 32-bit integer
# See https://stackoverflow.com/questions/71097470/msb-lsb-popcount-in-numba
@nb.njit('int_(uint32)', inline='always')
def popcount(v):
    v = v - ((v >> 1) & 0x55555555)
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
    c = np.uint32((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
    return c

# Count the number of bits in a 64-bit integer
@nb.njit(inline='always')
def bit_count(n):
    if n < (1 << 30):
        return popcount(np.uint32(n))
    else:
        return popcount(np.uint32(n)) + popcount(np.uint32(n >> 32))

# Mutate `out` so not to create an expensive new temporary array
@nb.njit
def int_to_path(n, out, a, b):
    for i in range(len(out)):
        out[i] = a[i] if ((n >> i) & 1) else b[i]

@nb.njit(['(int32[:], int32[:], int64, int64)', '(int64[:], int64[:], int64, int64)'])
def traverse_fast(a, b, n_a, n_b):
    # This assertion is needed because the paths are encoded using 64-bit.
    # This should not be a problem in practice since the number of solutions to
    # test would be impracticably huge to test using this algorithm anyway.
    assert n_a + n_b < 62

    max_iter = 1 << (n_a + n_b)
    path = np.empty(n_a + n_b, dtype=a.dtype)
    score, best_score, best_i = 0, 0, 0

    # Iterate over all cases (more than the set of possible solution)
    for i in range(max_iter):
        # Filter the possible solutions
        if bit_count(i) != n_b:
            continue

        # Analyse the score of the solution
        int_to_path(i, path, a, b)
        score = count_unique(path)

        # Store it if it better than the previous one
        if score > best_score:
            best_score = score
            best_i = i

    int_to_path(best_i, path, a, b)
    return best_score, path

This implementation is about 30 times faster on arrays of size 8 on my machine. On could use several cores to speed this up even further. However, I think it is better to focus on finding a sub-exponential implementation so to avoid wasting more computing resources. Note that the path is different from the initial function but the score is the same on random arrays. It can help others to test their implementation on larger arrays without waiting for a long time.

Answered By: Jérôme Richard
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.