Getting distinct values from from a list comprised of lists containing a comma delimited string

Question:

Main list:

data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]

I want to get a list comprised of unique lists like so:

data_unique = [
["629-2, text1, 12"],
["407-3, text9, 6"],
["000-5, text6, 0"],
]

I’ve tried using numpy.unique but I need to pare it down further as I need the list to be populated by lists containing a single unique version of the numerical designator in the beginning of the string, ie. 629-2…

I’ve also tried using chain from itertools like this:

def get_unique(data):
    return list(set(chain(*data)))

But that only got me as far as numpy.unique.

Thanks in advance.

Asked By: PersonPr7

||

Answers:

# Convert the list of lists to a set
data_set = set(tuple(x) for x in data)

# Convert the set back to a list
data_unique = [list(x) for x in data_set]
Answered By: a5zima

I have used recursion to solve the problem!

def get_unique(lst):
        if not lst:
            return []
        if lst[0] in lst[1:]:
            return get_unique(lst[1:])
        else:
            return [lst[0]] + get_unique(lst[1:])

data = [
["629-2, text1, 12"],
["629-2, text2, 12"],
["407-3, text9, 6"],
["407-3, text4, 6"],
["000-5, text7, 0"],
["000-5, text6, 0"],
]
print(get_unique(data))

Here I am storing the last occurrence of the element in list.

Answered By: Punit Choudhary

Code

from itertools import groupby

def get_unique(data):
    def designated_version(item):
        return item[0].split(',')[0]

    return [list(v)[0] 
            for _, v in groupby(sorted(data, 
                                       key = designated_version),
                                designated_version)
           ]

 

Test

print(get_unique(data))
# Output
[['629-2, text1, 12'], ['407-3, text9, 6'], ['000-5, text7, 0']]

Explanation

  • Sorts data by designated number (in case not already sorted)
  • Uses groupby to group by the unique version of the numerical designator of each item in list i.e. lambda item: item[0].split(',')[0]
  • List comprehension keeps the first item in each grouped list i.e. list(v)[0]
Answered By: DarrylG
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.