List comprehension, check if item is unique
Question:
I am trying to write a list comprehension statement that will only add an item if it’s not currently contained in the list. Is there a way to check the current items in the list that is currently being constructed? Here is a brief example:
Input
{
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
Output
["running", "engineering", "dancing", "art", "theatre", "music"]
Code without using a list comprehension
output = []
for name, hobbies in input.items():
for hobby in hobbies:
if hobby not in output:
output.append(hobby)
My Attempt
[hobby for name, hobbies in input.items() for hobby in hobbies if hobby not in ???]
Answers:
Use a set:
dict = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
myset = set()
for _, value in dict.items():
for item in value:
myset.add(item)
print(myset)
How about this:
set(dict['Bob']+dict['Stefan']+dict['Julia'])
>>> set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
Or more nicely:
dict = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
list_ = []
for y in dict.keys():
list_ = list_ + dict[y]
list_ = set(list_)
>>> list_
set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
you can apply the list
function to list_ like list(list_)
to return a list rather than a set.
You can use set
and set comprehension:
{hobby for name, hobbies in input.items() for hobby in hobbies}
As m.wasowski mentioned, we don’t use the name
here, so we can use item.values()
instead:
{hobby for hobbies in input.values() for hobby in hobbies}
If you really need a list as the result, you can do this (but notice that usually you can work with sets without any problem):
list({hobby for hobbies in input.values() for hobby in hobbies})
As this answer suggests: you can use a uniqueness filter:
def f7(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
and call with:
>>> f7(hobby for name, hobbies in input.items() for hobby in hobbies)
['running', 'engineering', 'dancing', 'art', 'theatre', 'music']
I would implement the uniqueness filter separately since a design rule says “different things should be handled by different classes/methods/components/whatever”. Furthermore you can simply reuse this method if necessary.
Another advantage is – as is written at the linked answer – that the order of the items is preserved. For some applications, this might be necessary.
sets and dictionaries are your friends here:
from collections import OrderedDict
from itertools import chain # 'flattens' collection of iterables
data = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
# using set is the easiest way, but sets are unordered:
print {hobby for hobby in chain.from_iterable(data.values())}
# output:
# set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
# or use OrderedDict if you care about ordering:
print OrderedDict(
(hobby, None) for hobby in chain.from_iterable(data.values())
).keys()
# output:
# ['dancing', 'art', 'theatre', 'running', 'engineering', 'music']
If you really really want a listcomp and only a list-comp, you can do
>>> s = []
>>> [s.append(j) for i in d.values() for j in i if j not in s]
[None, None, None, None, None, None]
>>> s
['dancing', 'art', 'theatre', 'running', 'engineering', 'music']
Here, s
is a result of a side effect and d
is your original dictionary. The unique advantage here is that you can preserve the order unlike most other answers here.
Note: This a bad way as it exploits the list-comp and the result is a side effect. Don’t do it as a practice, This answer is just to show you that you can achieve it using a list comp alone
A list comprehension is not well-suited for this problem. I think a set comprehension would be better, but since that was already shown in another answer, I’ll show a way of solving this problem with a compact one-liner:
list(set(sum(hobbies_dict.values(), [])))
Another interesting solution using bitwise or operator which serves as a union operator for sets:
from operator import or_
from functools import reduce # Allowed, but unnecessary in Python 2.x
list(reduce(or_, map(set, hobbies_dict.values())))
Or (unintentional pun, I swear), instead of using bitwise or operator, just use set.union
and pass it the unpacked set-mapping of your values. No need to import or_
and reduce
! This idea is inspired by Thijs van Dien’s answer.
list(set.union(*map(set, hobbies_dict.values())))
There’s another way of writing this that is a bit more descriptive of what you’re actually doing, and doesn’t require a nested (double for
) comprehension:
output = set.union(*[set(hobbies) for hobbies in input_.values()])
This becomes even nicer when you’d represent the input to be more conceptually sound, i.e. use a set
for the hobbies of each person (since there shouldn’t be repetitions there either):
input_ = {
"Stefan" : {"running", "engineering", "dancing"},
"Bob" : {"dancing", "art", "theatre"},
"Julia" : {"running", "music", "art"}
}
output = set.union(*input_.values())
I am trying to write a list comprehension statement that will only add an item if it’s not currently contained in the list. Is there a way to check the current items in the list that is currently being constructed? Here is a brief example:
Input
{
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
Output
["running", "engineering", "dancing", "art", "theatre", "music"]
Code without using a list comprehension
output = []
for name, hobbies in input.items():
for hobby in hobbies:
if hobby not in output:
output.append(hobby)
My Attempt
[hobby for name, hobbies in input.items() for hobby in hobbies if hobby not in ???]
Use a set:
dict = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
myset = set()
for _, value in dict.items():
for item in value:
myset.add(item)
print(myset)
How about this:
set(dict['Bob']+dict['Stefan']+dict['Julia'])
>>> set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
Or more nicely:
dict = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
list_ = []
for y in dict.keys():
list_ = list_ + dict[y]
list_ = set(list_)
>>> list_
set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
you can apply the list
function to list_ like list(list_)
to return a list rather than a set.
You can use set
and set comprehension:
{hobby for name, hobbies in input.items() for hobby in hobbies}
As m.wasowski mentioned, we don’t use the name
here, so we can use item.values()
instead:
{hobby for hobbies in input.values() for hobby in hobbies}
If you really need a list as the result, you can do this (but notice that usually you can work with sets without any problem):
list({hobby for hobbies in input.values() for hobby in hobbies})
As this answer suggests: you can use a uniqueness filter:
def f7(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
and call with:
>>> f7(hobby for name, hobbies in input.items() for hobby in hobbies)
['running', 'engineering', 'dancing', 'art', 'theatre', 'music']
I would implement the uniqueness filter separately since a design rule says “different things should be handled by different classes/methods/components/whatever”. Furthermore you can simply reuse this method if necessary.
Another advantage is – as is written at the linked answer – that the order of the items is preserved. For some applications, this might be necessary.
sets and dictionaries are your friends here:
from collections import OrderedDict
from itertools import chain # 'flattens' collection of iterables
data = {
"Stefan" : ["running", "engineering", "dancing"],
"Bob" : ["dancing", "art", "theatre"],
"Julia" : ["running", "music", "art"]
}
# using set is the easiest way, but sets are unordered:
print {hobby for hobby in chain.from_iterable(data.values())}
# output:
# set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])
# or use OrderedDict if you care about ordering:
print OrderedDict(
(hobby, None) for hobby in chain.from_iterable(data.values())
).keys()
# output:
# ['dancing', 'art', 'theatre', 'running', 'engineering', 'music']
If you really really want a listcomp and only a list-comp, you can do
>>> s = []
>>> [s.append(j) for i in d.values() for j in i if j not in s]
[None, None, None, None, None, None]
>>> s
['dancing', 'art', 'theatre', 'running', 'engineering', 'music']
Here, s
is a result of a side effect and d
is your original dictionary. The unique advantage here is that you can preserve the order unlike most other answers here.
Note: This a bad way as it exploits the list-comp and the result is a side effect. Don’t do it as a practice, This answer is just to show you that you can achieve it using a list comp alone
A list comprehension is not well-suited for this problem. I think a set comprehension would be better, but since that was already shown in another answer, I’ll show a way of solving this problem with a compact one-liner:
list(set(sum(hobbies_dict.values(), [])))
Another interesting solution using bitwise or operator which serves as a union operator for sets:
from operator import or_
from functools import reduce # Allowed, but unnecessary in Python 2.x
list(reduce(or_, map(set, hobbies_dict.values())))
Or (unintentional pun, I swear), instead of using bitwise or operator, just use set.union
and pass it the unpacked set-mapping of your values. No need to import or_
and reduce
! This idea is inspired by Thijs van Dien’s answer.
list(set.union(*map(set, hobbies_dict.values())))
There’s another way of writing this that is a bit more descriptive of what you’re actually doing, and doesn’t require a nested (double for
) comprehension:
output = set.union(*[set(hobbies) for hobbies in input_.values()])
This becomes even nicer when you’d represent the input to be more conceptually sound, i.e. use a set
for the hobbies of each person (since there shouldn’t be repetitions there either):
input_ = {
"Stefan" : {"running", "engineering", "dancing"},
"Bob" : {"dancing", "art", "theatre"},
"Julia" : {"running", "music", "art"}
}
output = set.union(*input_.values())