How to address current state of list comprehension in its if condition?

Question:

I would like to turn the loop over the items in L in following code:

L = [1,2,5,2,1,1,3,4]
L_unique = []
for item in L:
    if item not in L_unique: 
        L_unique.append(item)

to a list comprehension like:

L_unique = [ item for item in L if item not in ???self??? ]

Is this possible in Python? And if it is possible, how can it be done?

Asked By: Claudio

||

Answers:

While what you’re asking for is highly impractical with a list comprehension, as L_unique does not exist until after the list comprehension completes, you can use a set comprehension.

L = [1,2,5,2,1,1,3,4]
L_unique = {x for x in L}

This is flexible if you wish to apply some other function to x, but in this simple form, you’re better off with just:

L = [1,2,5,2,1,1,3,4]
L_unique = set(L)

If needed, a set can be converted back to a list.

L = [1,2,5,2,1,1,3,4]
L_unique = list(set(L))

Using a set may change the order of the elements compared to the original list.

Answered By: Chris

So everything has been changed 🙂

gc.get_objects(generation=None)

Returns a list of all objects tracked by the collector, excluding the
list returned. If generation is not None, return only the objects
tracked by the collector that are in that generation.

Changed in version 3.8: New generation parameter.

Raises an auditing event gc.get_objects with argument generation.

Without using gc

A simple way:

L = [1,2,5,2,1,1,3,4]
L_unique = []

# This returns just a list of None
_ = [L_unique.append(i) for i in L if i not in L_unique]
L_unique

Output:

[1, 2, 3, 4, 5]

Or you can use this:

L = [1,2,5,2,1,1,3,4]
list(set(L))

Output:

[1, 2, 3, 4, 5]
Answered By: Shahab Rahnama

List comprehension actually makes an anonymous function then call it, but the list built will not be stored in local variable dictionary, but in the stack maintained in Python, so few Python level operations can obtain this list (gc is a crazy but feasible choice. Sorry for my exaggeration before, the solution using gc is attached at the end):

>>> [locals().copy() for i in range(3)]
[{'.0': <range_iterator at 0x207eeaca730>, 'i': 0},    # does not contain the built list
 {'.0': <range_iterator at 0x207eeaca730>, 'i': 1},
 {'.0': <range_iterator at 0x207eeaca730>, 'i': 2}]
>>> dis('[i for i in iterable]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x00000211FEAFD000, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x00000211FEAFD000, file "<dis>", line 1>:
  1           0 BUILD_LIST               0    # build an empty list and push it onto the stack
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 4 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2     # get the built list through stack and index
             12 JUMP_ABSOLUTE            2 (to 4)
        >>   14 RETURN_VALUE

For the example you provided, you can use list(dict.fromkeys(L)) to get the same results in Python 3.7+. Here I use dict instead of set because dict can preserve the insertion order:

>>> list(dict.fromkeys(L))
[1, 2, 5, 3, 4]

According to @KellyBundy , the current method I have found is to use gc.get_objects, but this operation is very expensive (because it collects more than 1000 objects) and I can’t determine its accuracy:

>>> [item for item in L if item not in gc.get_objects(0)[-1]]
[1, 2, 5, 3, 4]

Making operations cheaper through caching:

>>> lst = None
>>> [item for item in L if item not in (lst := gc.get_objects(0)[-1] if lst is None else lst)]
[1, 2, 5, 3, 4]
Answered By: Mechanic Pig

It’s possible. Here’s a hack that does it, but I wouldn’t use this in practice as it’s nasty and depends on implementation details that might change and I believe it’s not thread-safe, either. Just to demonstrate that it’s possible.

You’re mostly right with your "somewhere must exist an object storing the current state of the comprehension" (although it doesn’t necessarily have to be a Python list object, Python could store the elements some other way and create the list object only afterwards).

We can find the new list object in the objects tracked by garbage collection. Collect the IDs of lists before the comprehension’s list is created, then look again and take the one that wasn’t there before.

Demo:

import gc

L = [1,2,5,2,1,1,3,4]

L_unique = [
    item

    # the hack to get self
    for ids in [{id(o) for o in gc.get_objects() if type(o) is list}]
    for self in [o for o in gc.get_objects() if type(o) is list and id(o) not in ids]

    for item in L
    if item not in self
]

print(L_unique)

Output (Try it online!):

[1, 2, 5, 3, 4]

Worked both in the linked site’s Python 3.8 pre-release version and in Python 3.10.2 somewhere else.

For an alternative with the exact style you asked, only replacing your ???self???, see Mechanic Pig’s updated answer.

Answered By: Kelly Bundy

If I had to remove duplicate elements from a list, and I wanted to preserve the order of the elements in the list, I would use the unique_everseen() function from the more_itertools library:

>>> from more_itertools import unique_everseen
>>> L = [1,2,5,2,1,1,3,4]
>>> list(unique_everseen(L))
[1, 2, 5, 3, 4]

This is similar to the set() approach that several others have suggested, but guaranteed to preserved order.

Answered By: Kale Kundert
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.