How to find intersect indexes and values in Python?
Question:
I try to convert code from Matlab to python
I have code in Matlab:
[value, iA, iB] = intersect(netA{i},netB{j});
I am looking for code in python that find the values common to both A and B, as well as the index vectors ia and ib (for each common element, its first index in A and its first index in B).
I try to use different solution, but I received vectors with different length. tried to use numpy.in1d/intersect1d , that returns bad not the same value.
Thing I try to do :
def FindoverlapIndx(self,a, b):
bool_a = np.in1d(a, b)
ind_a = np.arange(len(a))
ind_a = ind_a[bool_a]
ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
return ind_a, ind_b
IS=np.arange(IDs[i].shape[0])[np.in1d(IDs[i], R_IDs[j])]
IR = np.arange(R_IDs[j].shape[0])[np.in1d(R_IDs[j],IDs[i])]
I received indexes with different lengths. But both must be of the same length as in Matlab’s intersect
.
Answers:
MATLAB’s intersect(a, b)
returns:
- common values of
a
and b
, sorted
- the first position of each of them in
a
- the first position of each of them in
b
NumPy’s intersect1d
does only the first part. So I read its source and modified it to return indices as well.
import numpy as np
def intersect_mtlb(a, b):
a1, ia = np.unique(a, return_index=True)
b1, ib = np.unique(b, return_index=True)
aux = np.concatenate((a1, b1))
aux.sort()
c = aux[:-1][aux[1:] == aux[:-1]]
return c, ia[np.isin(a1, c)], ib[np.isin(b1, c)]
a = np.array([7, 1, 7, 7, 4]);
b = np.array([7, 0, 4, 4, 0]);
c, ia, ib = intersect_mtlb(a, b)
print(c, ia, ib)
This prints [4 7] [4 0] [2 0]
which is consistent with the output on MATLAB documentation page, as I used the same example as they did. Of course, indices are 0-based in Python unlike MATLAB.
Explanation: the function takes unique elements from each array, puts them together, and concatenates: the result is [0 1 4 4 7 7]
. Each number appears at most twice here; when it’s repeated, that means it was in both arrays. This is what aux[1:] == aux[:-1]
selects for.
The array ia
contains the first index of each element of a1
in the original array a
. Filtering it by isin(a1, c)
leaves only the indices that were in c
. Same is done for ib
.
EDIT:
Since version 1.15.0, intersect1d does the second and third part if you pass return_indices=True
:
x = np.array([1, 1, 2, 3, 4])
y = np.array([2, 1, 4, 6])
xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True)
Where you get xy = array([1, 2, 4])
, x_ind = array([0, 2, 4])
and y_ind = array([1, 0, 2])
I try to convert code from Matlab to python
I have code in Matlab:
[value, iA, iB] = intersect(netA{i},netB{j});
I am looking for code in python that find the values common to both A and B, as well as the index vectors ia and ib (for each common element, its first index in A and its first index in B).
I try to use different solution, but I received vectors with different length. tried to use numpy.in1d/intersect1d , that returns bad not the same value.
Thing I try to do :
def FindoverlapIndx(self,a, b):
bool_a = np.in1d(a, b)
ind_a = np.arange(len(a))
ind_a = ind_a[bool_a]
ind_b = np.array([np.argwhere(b == a[x]) for x in ind_a]).flatten()
return ind_a, ind_b
IS=np.arange(IDs[i].shape[0])[np.in1d(IDs[i], R_IDs[j])]
IR = np.arange(R_IDs[j].shape[0])[np.in1d(R_IDs[j],IDs[i])]
I received indexes with different lengths. But both must be of the same length as in Matlab’s intersect
.
MATLAB’s intersect(a, b)
returns:
- common values of
a
andb
, sorted - the first position of each of them in
a
- the first position of each of them in
b
NumPy’s intersect1d
does only the first part. So I read its source and modified it to return indices as well.
import numpy as np
def intersect_mtlb(a, b):
a1, ia = np.unique(a, return_index=True)
b1, ib = np.unique(b, return_index=True)
aux = np.concatenate((a1, b1))
aux.sort()
c = aux[:-1][aux[1:] == aux[:-1]]
return c, ia[np.isin(a1, c)], ib[np.isin(b1, c)]
a = np.array([7, 1, 7, 7, 4]);
b = np.array([7, 0, 4, 4, 0]);
c, ia, ib = intersect_mtlb(a, b)
print(c, ia, ib)
This prints [4 7] [4 0] [2 0]
which is consistent with the output on MATLAB documentation page, as I used the same example as they did. Of course, indices are 0-based in Python unlike MATLAB.
Explanation: the function takes unique elements from each array, puts them together, and concatenates: the result is [0 1 4 4 7 7]
. Each number appears at most twice here; when it’s repeated, that means it was in both arrays. This is what aux[1:] == aux[:-1]
selects for.
The array ia
contains the first index of each element of a1
in the original array a
. Filtering it by isin(a1, c)
leaves only the indices that were in c
. Same is done for ib
.
EDIT:
Since version 1.15.0, intersect1d does the second and third part if you pass return_indices=True
:
x = np.array([1, 1, 2, 3, 4])
y = np.array([2, 1, 4, 6])
xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True)
Where you get xy = array([1, 2, 4])
, x_ind = array([0, 2, 4])
and y_ind = array([1, 0, 2])