Python combinations without repetitions

Question:

I have a list of numbers and I want to make combinations from it. If I have list:

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))

The result is:

(2, 2, 2, 2)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)

but I want to get:

(2, 2, 2, 2)
(2, 2, 2, 4)

Is it possible to eliminate duplicates except making new list and going through first list?

Asked By: GoobyPrs

||

Answers:

As Donkey Kong points to set, You can get the unique values in a list by converting the list to a set :

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))
unq = set(c)
print(unq)

And the result will be:

{(2, 2, 2, 4), (2, 2, 2, 2)}

If you want to use it as a list, you can convert it back by doing :

result = list(unq)

Alternative and more clean,comprehensive way will be :

t = [2,2,2,2,4]
c = set(itertools.combinations(t, 4))
Answered By: Randhawa

Technically, what you get are not actually duplicates, it’s simply how itertools.combinations works, if you read the description in the linked page:

itertools.combinations(iterable, r)

Return r length subsequences of elements from the input iterable.

Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in
sorted order.

Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no
repeat values in each combination.

DEMO:

>>> import itertools as it
>>> list(it.combinations([1,2,3,4,5], 4))
[(1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5), (2, 3, 4, 5)]

So, just as posted on the previous answer, set() will give you the unique values you want:

>>> set(it.combinations(t, 4))
{(2, 2, 2, 4), (2, 2, 2, 2)}
Answered By: Iron Fist

I know this is late but I want to add a point.

set(itertools.combinations(t, 4)) would do a fine job for most cases, but it still iterates all repetitive combinations internally and so it can be computationally heavy. This is especially the case if there aren’t many actual unique combinations.

This one iterates only unique combinations:

from itertools import chain, repeat, count, islice
from collections import Counter


def repeat_chain(values, counts):
    return chain.from_iterable(map(repeat, values, counts))


def unique_combinations_from_value_counts(values, counts, r):
    n = len(counts)
    indices = list(islice(repeat_chain(count(), counts), r))
    if len(indices) < r:
        return
    while True:
        yield tuple(values[i] for i in indices)
        for i, j in zip(reversed(range(r)), repeat_chain(reversed(range(n)), reversed(counts))):
            if indices[i] != j:
                break
        else:
            return
        j = indices[i] + 1
        for i, j in zip(range(i, r), repeat_chain(count(j), counts[j:])):
            indices[i] = j


def unique_combinations(iterable, r):
    values, counts = zip(*Counter(iterable).items())
    return unique_combinations_from_value_counts(values, counts, r)

Usage:

>>> list(unique_combinations([2, 2, 2, 2, 4], 4)) # elements must be hashable
[(2, 2, 2, 2), (2, 2, 2, 4)]

# You can pass values and counts separately. For this usage, values don't need to be hashable
# Say you have ['a','b','b','c','c','c'], then since there is 1 of 'a', 2 of 'b', and 3 of 'c', you can do as follows:
>>> list(unique_combinations_from_value_counts(['a', 'b', 'c'], [1, 2, 3], 3))
[('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'c'), ('b', 'b', 'c'), ('b', 'c', 'c'), ('c', 'c', 'c')]

# unique_combinations() is a generator (and thus an iterator)
# so you can iterate it
>>> for comb in unique_combinations([2, 2, 2, 2, 4], 4):
...     print(sum(comb))
...
8   # 2+2+2+2
10  # 2+2+2+4

Note that itertools.combinations() is implemented in C, which means it is much faster than my python script for most cases. This code works better than set(itertools.combinations()) method only when there are A LOT MORE repetitive combinations than unique combinations.

Answered By: hahho

This can now be done using the package more-itertools which, as of version 8.7, has a function called distinct_combinations to achieve this.

>>> from itertools import combinations
>>> t = [2,2,2,2,4]
>>> set(combinations(t, 4))
{(2, 2, 2, 2), (2, 2, 2, 4)}

>>> from more_itertools import distinct_combinations
>>> t = [2,2,2,2,4]
>>> list(distinct_combinations(t,4))
(2, 2, 2, 2), (2, 2, 2, 4)]

As far as I can tell with my very limited testing performance is similar to the function written by @hahho

Answered By: JJR4
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.