How to check whether tensor values in a different tensor pytorch?
Question:
I have 2 tensors of unequal size
a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])
I want a boolean array of whether each value exists in the other tensor without iterating. something like
a in b
and the result should be
[False, True, False]
as only the value of a[1] is in b
Answers:
I think it’s impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:
[True if i in b else False for i in a]
Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].
this should work
result = []
for i in a:
try: # to avoid error for the case of empty tensors
result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
except:
result.append(False)
result
If you need to compare all subtensors across the first dimension of a
, use in
:
>>> [i in b for i in a]
[False, True, False]
Neither of the solutions that use tensor in tensor
work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True
for those elements, potentially leading to hours of debugging. For example:
torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True
A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors
are rather slow for iterating and equality checking, so converting to numpy
may be needed for large data:
[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]
That being said, @yuri’s solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.
I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates()
. More specifically, for OP’s problem, one can do:
import pandas as pd
import torch
tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)
I have 2 tensors of unequal size
a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])
I want a boolean array of whether each value exists in the other tensor without iterating. something like
a in b
and the result should be
[False, True, False]
as only the value of a[1] is in b
I think it’s impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:
[True if i in b else False for i in a]
Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].
this should work
result = []
for i in a:
try: # to avoid error for the case of empty tensors
result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
except:
result.append(False)
result
If you need to compare all subtensors across the first dimension of a
, use in
:
>>> [i in b for i in a]
[False, True, False]
Neither of the solutions that use tensor in tensor
work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True
for those elements, potentially leading to hours of debugging. For example:
torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True
A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors
are rather slow for iterating and equality checking, so converting to numpy
may be needed for large data:
[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]
That being said, @yuri’s solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.
I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates()
. More specifically, for OP’s problem, one can do:
import pandas as pd
import torch
tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)