Find tuple structure containing an unknown value inside a list
Question:
Say I have list of tuples:
list = [(1,5), (1,7), (2,3)]
Is there a way in Python to write something like
if (1, *) in list: do things
where *
means “I don’t care about this value“? So we are checking if there is a tuple with 1
at the first position and with whatever value on the second one.
As far as I know there are special mechanisms in other languages, but I just don’t know the name of this particular problem. So is there similar behavior in Python?
P.S.: I know that I can use list comprehensions here. I am just interested in this particular mechanism.
Answers:
You can use the any()
function:
if any(t[0] == 1 for t in yourlist):
This efficiently tests and exits early if 1
is found in the first position of a tuple.
This can be done in Python using list comprehension.
ex:
a= [(1, 2), (3, 4), (4, 5), (1, 4)]
[i for i in a if i[0] == 1]
Will give you:
[(1, 2), (1, 4)]
number of element in tuple could be handled also.
>>> import operator
>>> mylist = [(1,2), (1,5), (4,5,8)]
>>> any(i==1 for i in map(operator.itemgetter(0), mylist))
True
Not all of my solution methods provided below will be necessarily efficient. My goal is to demonstrate every possible solution method I can think of – at the end of my answer I provide “benchmark” results to show why or why not you should use one certain method over another. I believe that is a good way of learning, and I will shamelessly encourage such learning in my answers.
Subset + hash set
s
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> set([l[0] for l in a_list])
{1, 2}
>>>
>>> 1 in set([l[0] for l in a_list])
True
map()
, and anonymous functions
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> map(lambda x: x[0] == 1, a_list)
[True, True, False]
>>>
>>> True in set(map(lambda x: x[0] == 1, a_list))
True
filter
and anonymous functions
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> filter(lambda x: x[0] == 1, a_list)
[(1,5), (1,7)]
>>>
>>> len(filter(lambda x: x[0] == 1, a_list)) > 0 # non-empty list
True
MICROBENCHMARKS
Conditions
- 1000 items
- 100K repetition
- 0-100 random range
- Python 2.7.10, IPython 2.3.0
Script
from pprint import pprint
from random import randint
from timeit import timeit
N_ITEMS = 1000
N_SIM = 1 * (10 ** 5) # 100K = 100000
a_list = [(randint(0, 100), randint(0, 100)) for _ in range(N_ITEMS)]
set_membership_list_comprehension_time = timeit(
"1 in set([l[0] for l in a_list])",
number = N_SIM,
setup="from __main__ import a_list"
)
bool_membership_map_time = timeit(
"True in set(map(lambda x: x[0] == 1, a_list))",
number = N_SIM,
setup="from __main__ import a_list"
)
nonzero_length_filter_time = timeit(
"len(filter(lambda x: x[0] == 1, a_list)) > 0",
number = N_SIM,
setup="from __main__ import a_list"
)
any_list_comprehension_time = timeit(
"any(t[0] == 1 for t in a_list)",
number = N_SIM,
setup="from __main__ import a_list"
)
results = {
"any(t[0] == 1 for t in a_list)": any_list_comprehension_time,
"len(filter(lambda x: x[0] == 1, a_list)) > 0": nonzero_length_filter_time,
"True in set(map(lambda x: x[0] == 1, a_list))": bool_membership_map_time,
"1 in set([l[0] for l in a_list])": set_membership_list_comprehension_time
}
pprint(
sorted(results.items(), key = lambda x: x[1])
)
Results (in seconds)
[('any(t[0] == 1 for t in a_list)', 2.6685791015625), # winner - Martijn
('1 in set([l[0] for l in a_list])', 4.85234808921814),
('len(filter(lambda x: x[0] == 1, a_list)) > 0', 7.11224889755249),
('True in set(map(lambda x: x[0] == 1, a_list))', 10.343087911605835)]
Who’s got the last laugh now? … Martijn (at least I tried)
MORAL OF THE STORY: Don’t spend more than 10 minutes “proving” your inferior solution is faster and more efficient on a small test data, when another user’s answer is the de-facto correct one
A placeholder object like you’re asking for isn’t supported natively, but you can make something like that yourself:
class Any(object):
def __eq__(self, other):
return True
ANYTHING = Any()
lst = [(1,5), (1,7), (2,3)]
The __eq__
method defines how two objects test for equality. (See https://docs.python.org/3/reference/datamodel.html for details.) Here, ANYTHING
will always test positive for equality with any object. (Unless that object also overrode __eq__
in a way to return False.)
The in
operator merely calls __eq__
for each element in your list. I.e. a in b
does something like:
for elem in b:
if elem == a:
return True
This means that, if you say (1, ANYTHING) in lst
, Python will first compare (1, ANYTHING)
to the first element in lst
. (Tuples, in turn, define __eq__
to return True if all its elements’ __eq__
return True. I.e. (x, y) == (a, b)
is equivalent to x==a and y==b
, or x.__eq__(a) and y.__eq__(b)
.)
Hence, (1, ANYTHING) in lst
will return True, while (3, ANYTHING) in lst
will return False.
Also, note that I renamed your list lst
instead of list
to prevent name clashes with the Python built-in list
.
It sounds like you actually want filter()
, not any()
:
tuple_list = [(1,5), (1,7), (2,3)]
for pair in filter(lambda pair: (pair[0] == 1), tuple_list):
print "Second value {pair[1]} found from {pair}".format(pair=pair)
...
Second value 5 found from (1, 5)
Second value 7 found from (1, 7)
The filter() method is great because you can provide a function directly to it. This lets you specify a certain key to filter on, etc. To simplify it further, use a lambda expression to make the entire thing into a one-liner.
Indexing is the simplest but if you wanted to use syntax similar to your example where you wanted to assign the first value to a variable and ignore the rest you could use python3’s extended iterable unpacking.
In [3]: [a for a,*_ in l]
Out[3]: [1, 1, 2]
Or with the any logic:
In [4]: l = [(1,5), (1,7), (2,3)]
In [5]: any(a == 1 for a,*_ in l)
Out[5]: True
Or mimicking any without the function call:
In [23]: l = [(1,5), (1,7), (2,3)]
In [24]: g = (a for a,*_ in l)
In [25]: 1 in g
Out[25]: True
In [26]: list(g)
Out[26]: [1, 2]
Say I have list of tuples:
list = [(1,5), (1,7), (2,3)]
Is there a way in Python to write something like
if (1, *) in list: do things
where *
means “I don’t care about this value“? So we are checking if there is a tuple with 1
at the first position and with whatever value on the second one.
As far as I know there are special mechanisms in other languages, but I just don’t know the name of this particular problem. So is there similar behavior in Python?
P.S.: I know that I can use list comprehensions here. I am just interested in this particular mechanism.
You can use the any()
function:
if any(t[0] == 1 for t in yourlist):
This efficiently tests and exits early if 1
is found in the first position of a tuple.
This can be done in Python using list comprehension.
ex:
a= [(1, 2), (3, 4), (4, 5), (1, 4)]
[i for i in a if i[0] == 1]
Will give you:
[(1, 2), (1, 4)]
number of element in tuple could be handled also.
>>> import operator
>>> mylist = [(1,2), (1,5), (4,5,8)]
>>> any(i==1 for i in map(operator.itemgetter(0), mylist))
True
Not all of my solution methods provided below will be necessarily efficient. My goal is to demonstrate every possible solution method I can think of – at the end of my answer I provide “benchmark” results to show why or why not you should use one certain method over another. I believe that is a good way of learning, and I will shamelessly encourage such learning in my answers.
Subset + hash set
s
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> set([l[0] for l in a_list])
{1, 2}
>>>
>>> 1 in set([l[0] for l in a_list])
True
map()
, and anonymous functions
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> map(lambda x: x[0] == 1, a_list)
[True, True, False]
>>>
>>> True in set(map(lambda x: x[0] == 1, a_list))
True
filter
and anonymous functions
>>> a_list = [(1,5), (1,7), (2,3)]
>>>
>>> filter(lambda x: x[0] == 1, a_list)
[(1,5), (1,7)]
>>>
>>> len(filter(lambda x: x[0] == 1, a_list)) > 0 # non-empty list
True
MICROBENCHMARKS
Conditions
- 1000 items
- 100K repetition
- 0-100 random range
- Python 2.7.10, IPython 2.3.0
Script
from pprint import pprint
from random import randint
from timeit import timeit
N_ITEMS = 1000
N_SIM = 1 * (10 ** 5) # 100K = 100000
a_list = [(randint(0, 100), randint(0, 100)) for _ in range(N_ITEMS)]
set_membership_list_comprehension_time = timeit(
"1 in set([l[0] for l in a_list])",
number = N_SIM,
setup="from __main__ import a_list"
)
bool_membership_map_time = timeit(
"True in set(map(lambda x: x[0] == 1, a_list))",
number = N_SIM,
setup="from __main__ import a_list"
)
nonzero_length_filter_time = timeit(
"len(filter(lambda x: x[0] == 1, a_list)) > 0",
number = N_SIM,
setup="from __main__ import a_list"
)
any_list_comprehension_time = timeit(
"any(t[0] == 1 for t in a_list)",
number = N_SIM,
setup="from __main__ import a_list"
)
results = {
"any(t[0] == 1 for t in a_list)": any_list_comprehension_time,
"len(filter(lambda x: x[0] == 1, a_list)) > 0": nonzero_length_filter_time,
"True in set(map(lambda x: x[0] == 1, a_list))": bool_membership_map_time,
"1 in set([l[0] for l in a_list])": set_membership_list_comprehension_time
}
pprint(
sorted(results.items(), key = lambda x: x[1])
)
Results (in seconds)
[('any(t[0] == 1 for t in a_list)', 2.6685791015625), # winner - Martijn
('1 in set([l[0] for l in a_list])', 4.85234808921814),
('len(filter(lambda x: x[0] == 1, a_list)) > 0', 7.11224889755249),
('True in set(map(lambda x: x[0] == 1, a_list))', 10.343087911605835)]
Who’s got the last laugh now? … Martijn (at least I tried)
MORAL OF THE STORY: Don’t spend more than 10 minutes “proving” your inferior solution is faster and more efficient on a small test data, when another user’s answer is the de-facto correct one
A placeholder object like you’re asking for isn’t supported natively, but you can make something like that yourself:
class Any(object):
def __eq__(self, other):
return True
ANYTHING = Any()
lst = [(1,5), (1,7), (2,3)]
The __eq__
method defines how two objects test for equality. (See https://docs.python.org/3/reference/datamodel.html for details.) Here, ANYTHING
will always test positive for equality with any object. (Unless that object also overrode __eq__
in a way to return False.)
The in
operator merely calls __eq__
for each element in your list. I.e. a in b
does something like:
for elem in b:
if elem == a:
return True
This means that, if you say (1, ANYTHING) in lst
, Python will first compare (1, ANYTHING)
to the first element in lst
. (Tuples, in turn, define __eq__
to return True if all its elements’ __eq__
return True. I.e. (x, y) == (a, b)
is equivalent to x==a and y==b
, or x.__eq__(a) and y.__eq__(b)
.)
Hence, (1, ANYTHING) in lst
will return True, while (3, ANYTHING) in lst
will return False.
Also, note that I renamed your list lst
instead of list
to prevent name clashes with the Python built-in list
.
It sounds like you actually want filter()
, not any()
:
tuple_list = [(1,5), (1,7), (2,3)]
for pair in filter(lambda pair: (pair[0] == 1), tuple_list):
print "Second value {pair[1]} found from {pair}".format(pair=pair)
...
Second value 5 found from (1, 5)
Second value 7 found from (1, 7)
The filter() method is great because you can provide a function directly to it. This lets you specify a certain key to filter on, etc. To simplify it further, use a lambda expression to make the entire thing into a one-liner.
Indexing is the simplest but if you wanted to use syntax similar to your example where you wanted to assign the first value to a variable and ignore the rest you could use python3’s extended iterable unpacking.
In [3]: [a for a,*_ in l]
Out[3]: [1, 1, 2]
Or with the any logic:
In [4]: l = [(1,5), (1,7), (2,3)]
In [5]: any(a == 1 for a,*_ in l)
Out[5]: True
Or mimicking any without the function call:
In [23]: l = [(1,5), (1,7), (2,3)]
In [24]: g = (a for a,*_ in l)
In [25]: 1 in g
Out[25]: True
In [26]: list(g)
Out[26]: [1, 2]