Most efficient way to check if a group is a subgroup of other group using any data structure on Python 3
Question:
If I can choose between list of objects, lists of integers and dictionaries (is there any other data structure option?), which is the most efficient data structure to discover if a group is a subgroup of other group, and how to implement?
I have a class with unique integers atributes, so I can “represent” the class object’s by the integer they contain, or the object (pointer) itself.
a.atributeEx = 1
b.atributeEx = 2
c.atributeEx = 3
listOfPointers = [a,b,c]
listAtributes = [1,2,3]
dictEx = ['1':1, '2':2, '3':3]
One option is to use issubset, as bellow:
listAtributes2 = [1,2]
set(listAtributes).issubset(listAtributes2)
However, using the issubset function with a list of atributes, my code would take months to years to finish to run, as this has to be performed billions of times. Usually, one list has from 1 to 4 elements while the other has from 200 to 2000 elements.
What is the best approach for this problem?
Answers:
Look at python sets. The set
datatype has all of the operations you need (e.g. issubset
) and is really fast. If you can streamline your calling code and let the sets do the work, you will see this perform well.
>>> x = set(range(1000))
>>> y = {8,37,29,983}
>>> y.issubset(x)
True
So I put this in a loop. It ran 100,000,000 times in 27 seconds.
I think that if you pre-compute your sets instead of building them for each comparison, you will find this to be a super-effective way of handling it.
Using set
methods should be fairly efficient. In particular, if you do x.issubset(y)
, only the size of x
matters since a set
uses hashing to check if it contains a given item.
Although, if you are instantiating a set for every comparison, then you are adding a big overhead to your computations.
One solution is to instantiate a new set
in your __init__
method only and to populate it in your __setattr__
method. To allow attribute deletion as well, you can define a __delattr__
method that removes elements from the set
. Finally, you can use the __contains__
method to execute the subset comparison when using the keyword in
.
class Container:
def __init__(self, **kwargs):
self._attr_set = set()
for k, v in kwargs.items():
setattr(self, k, v)
def __setattr__(self, key, value):
self._attr_set.add((key, value))
super().__setattr__(key, value)
def __delattr__(self, item):
for x in self._attr_set:
if x[0] == item:
self._attr_set.remove(x)
break
super().__delattr__(item)
def __contains__(self, item):
return item._attr_set.issubset(self._attr_set)
Example
x = Container(a=1, b=2, c=3)
y = Container(a=1, b=2)
print(y in x) # True
If I can choose between list of objects, lists of integers and dictionaries (is there any other data structure option?), which is the most efficient data structure to discover if a group is a subgroup of other group, and how to implement?
I have a class with unique integers atributes, so I can “represent” the class object’s by the integer they contain, or the object (pointer) itself.
a.atributeEx = 1
b.atributeEx = 2
c.atributeEx = 3
listOfPointers = [a,b,c]
listAtributes = [1,2,3]
dictEx = ['1':1, '2':2, '3':3]
One option is to use issubset, as bellow:
listAtributes2 = [1,2]
set(listAtributes).issubset(listAtributes2)
However, using the issubset function with a list of atributes, my code would take months to years to finish to run, as this has to be performed billions of times. Usually, one list has from 1 to 4 elements while the other has from 200 to 2000 elements.
What is the best approach for this problem?
Look at python sets. The set
datatype has all of the operations you need (e.g. issubset
) and is really fast. If you can streamline your calling code and let the sets do the work, you will see this perform well.
>>> x = set(range(1000))
>>> y = {8,37,29,983}
>>> y.issubset(x)
True
So I put this in a loop. It ran 100,000,000 times in 27 seconds.
I think that if you pre-compute your sets instead of building them for each comparison, you will find this to be a super-effective way of handling it.
Using set
methods should be fairly efficient. In particular, if you do x.issubset(y)
, only the size of x
matters since a set
uses hashing to check if it contains a given item.
Although, if you are instantiating a set for every comparison, then you are adding a big overhead to your computations.
One solution is to instantiate a new set
in your __init__
method only and to populate it in your __setattr__
method. To allow attribute deletion as well, you can define a __delattr__
method that removes elements from the set
. Finally, you can use the __contains__
method to execute the subset comparison when using the keyword in
.
class Container:
def __init__(self, **kwargs):
self._attr_set = set()
for k, v in kwargs.items():
setattr(self, k, v)
def __setattr__(self, key, value):
self._attr_set.add((key, value))
super().__setattr__(key, value)
def __delattr__(self, item):
for x in self._attr_set:
if x[0] == item:
self._attr_set.remove(x)
break
super().__delattr__(item)
def __contains__(self, item):
return item._attr_set.issubset(self._attr_set)
Example
x = Container(a=1, b=2, c=3)
y = Container(a=1, b=2)
print(y in x) # True