Remove duplicates by rule from a big list

Question

I have a list of 10.000 elements of custom objects. The important thing is that these objects have a graph as an attribute (consisting of something between 20 and 500 nodes).

Now I would like to remove all the duplicates in the list, where I assume two objects to be equal if and only if their graphs are isomorphic.

My code looks something like that:

import networkx as nx
#
def remove_duplicates(list_):
 
    filtered_list = list()
 
    while len(list_) > 0:
        A = list_.pop()
        filtered_list.append(A)
 
        for B in list_:
            if A.num_of_nodes == B.num_of_nodes:
                if nx.is_isomorphic(A.graph, B.graph, node_match=node_check):
                     list_.remove(B)

    return filtered_list

However, the program stops progressing at a certain point. I checked the activity monitor and apparently the memory is not an issue, but maybe the CPU.

Does someone have a hint on how to solve this a little bit more efficiently/elegantly? For smaller samples, my code worked well.

Asked By: student7481

||

Source

Answer 1

This will work as soon as you’ll find something in self.graph that is identical for any other isomorphic graph, and put it in place of ??? (e.g. something like self.graph.some_isomorphic_characteristic)

import networkx as nx
from typing import List

class custom_object:
    def __init__(self, num_of_nodes, graph):
        self.num_of_nodes = num_of_nodes
        self.graph = graph
    def __eq__(self, other):
        return self.num_of_nodes == other.num_of_nodes and nx.is_isomorphic(self.graph, other.graph, node_match=node_check)
    def __ne__(self, other):
        return not self.__eq__(other)
    def __hash__(self):
        return hash((self.num_of_nodes, ???))

def remove_duplicates(list_: List[custom_object]) -> List[custom_object]:
    return list(set(list_))

Otherwise you can use

import networkx as nx
from typing import List

class custom_object:
    def __init__(self, num_of_nodes, graph):
        self.num_of_nodes = num_of_nodes
        self.graph = graph
    def __eq__(self, other):
        return self.num_of_nodes == other.num_of_nodes and nx.is_isomorphic(self.graph, other.graph, node_match=node_check)
    def __ne__(self, other):
        return not self.__eq__(other)

def remove_duplicates(list_: List[custom_object]) -> List[custom_object]:
    filtered_list = []
    for obj in list_:
        add = True
        for filtered_obj in filtered_list:
            if obj == filtered_obj:
                add = False
                break
        if add:
            filtered_list.append(obj)
    return filtered_list

Answered By: chc

Answer 2

Does this help? I do think this method works from what I understand from your question:

import networkx as nx

def remove_duplicates(list_):
 
    for i in Counter(list_).items():
        if i[1] > 1:
            for v in range(i[1]-1):
                list_.remove(i[0])

    return list_

Basically we used the Counter() class from the Collections module. Basically we count how many times the same value is repeated in the list and then remove depending upon the number of times it repeated.

In your case the code might look like this:

I hope this answers your question.

Answered By: ss3387

Remove duplicates by rule from a big list

Question:

Answers: