How to check whether tensor values in a different tensor pytorch?

Question:

I have 2 tensors of unequal size

a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])

I want a boolean array of whether each value exists in the other tensor without iterating. something like

a in b

and the result should be

[False, True, False]

as only the value of a[1] is in b

Asked By: Adam Williams

||

Answers:

I think it’s impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:

[True if i in b else False for i in a]

Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].

Answered By: AverageHomosapien

this should work

result = []
for i in a:
    try: # to avoid error for the case of empty tensors
        result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
    except:
        result.append(False)
result
Answered By: yuri

If you need to compare all subtensors across the first dimension of a, use in:

>>> [i in b for i in a]
[False, True, False]
Answered By: iacob

Neither of the solutions that use tensor in tensor work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True for those elements, potentially leading to hours of debugging. For example:

torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True

A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors are rather slow for iterating and equality checking, so converting to numpy may be needed for large data:

[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]

That being said, @yuri’s solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.

Answered By: Andrei Rusu

I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates(). More specifically, for OP’s problem, one can do:

import pandas as pd
import torch

tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)
Answered By: user48867
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.