Most efficient way to check if a group is a subgroup of other group using any data structure on Python 3

Question:

If I can choose between list of objects, lists of integers and dictionaries (is there any other data structure option?), which is the most efficient data structure to discover if a group is a subgroup of other group, and how to implement?

I have a class with unique integers atributes, so I can “represent” the class object’s by the integer they contain, or the object (pointer) itself.

a.atributeEx = 1
b.atributeEx = 2
c.atributeEx = 3

listOfPointers = [a,b,c]
listAtributes = [1,2,3]
dictEx = ['1':1, '2':2, '3':3]

One option is to use issubset, as bellow:

listAtributes2 = [1,2]
set(listAtributes).issubset(listAtributes2)

However, using the issubset function with a list of atributes, my code would take months to years to finish to run, as this has to be performed billions of times. Usually, one list has from 1 to 4 elements while the other has from 200 to 2000 elements.

What is the best approach for this problem?

Asked By: Rodrigo

||

Answers:

Look at python sets. The set datatype has all of the operations you need (e.g. issubset) and is really fast. If you can streamline your calling code and let the sets do the work, you will see this perform well.

>>> x = set(range(1000))
>>> y = {8,37,29,983}
>>> y.issubset(x) 
True

So I put this in a loop. It ran 100,000,000 times in 27 seconds.

I think that if you pre-compute your sets instead of building them for each comparison, you will find this to be a super-effective way of handling it.

https://docs.python.org/3/tutorial/datastructures.html#sets

Answered By: gahooa

Using set methods should be fairly efficient. In particular, if you do x.issubset(y), only the size of x matters since a set uses hashing to check if it contains a given item.

Although, if you are instantiating a set for every comparison, then you are adding a big overhead to your computations.

One solution is to instantiate a new set in your __init__ method only and to populate it in your __setattr__ method. To allow attribute deletion as well, you can define a __delattr__ method that removes elements from the set. Finally, you can use the __contains__ method to execute the subset comparison when using the keyword in.

class Container:
    def __init__(self, **kwargs):
        self._attr_set = set()
        for k, v in kwargs.items():
            setattr(self, k, v)

    def __setattr__(self, key, value):
        self._attr_set.add((key, value))
        super().__setattr__(key, value)

    def __delattr__(self, item):
        for x in self._attr_set:
            if x[0] == item:
                self._attr_set.remove(x)
                break
        super().__delattr__(item)


    def __contains__(self, item):
        return item._attr_set.issubset(self._attr_set)

Example

x = Container(a=1, b=2, c=3)
y = Container(a=1, b=2)

print(y in x) # True
Answered By: Olivier Melançon