How to find intersect indexes and values in Python?

Question:

I try to convert code from Matlab to python
I have code in Matlab:

[value, iA, iB] = intersect(netA{i},netB{j});

I am looking for code in python that find the values common to both A and B, as well as the index vectors ia and ib (for each common element, its first index in A and its first index in B).

I try to use different solution, but I received vectors with different length. tried to use numpy.in1d/intersect1d , that returns bad not the same value.
Thing I try to do :

def FindoverlapIndx(self,a, b):
    bool_a = np.in1d(a, b)
    ind_a = np.arange(len(a))
    ind_a = ind_a[bool_a]
    ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
    return ind_a, ind_b

 IS=np.arange(IDs[i].shape[0])[np.in1d(IDs[i], R_IDs[j])]
 IR = np.arange(R_IDs[j].shape[0])[np.in1d(R_IDs[j],IDs[i])]

I received indexes with different lengths. But both must be of the same length as in Matlab’s intersect.

Asked By: MAK

||

Answers:

MATLAB’s intersect(a, b) returns:

  • common values of a and b, sorted
  • the first position of each of them in a
  • the first position of each of them in b

NumPy’s intersect1d does only the first part. So I read its source and modified it to return indices as well.

import numpy as np
def intersect_mtlb(a, b):
    a1, ia = np.unique(a, return_index=True)
    b1, ib = np.unique(b, return_index=True)
    aux = np.concatenate((a1, b1))
    aux.sort()
    c = aux[:-1][aux[1:] == aux[:-1]]
    return c, ia[np.isin(a1, c)], ib[np.isin(b1, c)]

a = np.array([7, 1, 7, 7, 4]);
b = np.array([7, 0, 4, 4, 0]);
c, ia, ib = intersect_mtlb(a, b)
print(c, ia, ib)

This prints [4 7] [4 0] [2 0] which is consistent with the output on MATLAB documentation page, as I used the same example as they did. Of course, indices are 0-based in Python unlike MATLAB.

Explanation: the function takes unique elements from each array, puts them together, and concatenates: the result is [0 1 4 4 7 7]. Each number appears at most twice here; when it’s repeated, that means it was in both arrays. This is what aux[1:] == aux[:-1] selects for.

The array ia contains the first index of each element of a1 in the original array a. Filtering it by isin(a1, c) leaves only the indices that were in c. Same is done for ib.

EDIT:
Since version 1.15.0, intersect1d does the second and third part if you pass return_indices=True:

x = np.array([1, 1, 2, 3, 4])
y = np.array([2, 1, 4, 6])
xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True)

Where you get xy = array([1, 2, 4]), x_ind = array([0, 2, 4]) and y_ind = array([1, 0, 2])

Answered By: user6655984
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.