Check if all elements in a list are identical
Question:
I need a function which takes in a list
and outputs True
if all elements in the input list evaluate as equal to each other using the standard equality operator and False
otherwise.
I feel it would be best to iterate through the list comparing adjacent elements and then AND
all the resulting Boolean values. But I’m not sure what’s the most Pythonic way to do that.
Answers:
Doubt this is the “most Pythonic”, but something like:
>>> falseList = [1,2,3,4]
>>> trueList = [1, 1, 1]
>>>
>>> def testList(list):
... for item in list[1:]:
... if item != list[0]:
... return False
... return True
...
>>> testList(falseList)
False
>>> testList(trueList)
True
would do the trick.
Use itertools.groupby
(see the itertools
recipes):
from itertools import groupby
def all_equal(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
or without groupby
:
def all_equal(iterator):
iterator = iter(iterator)
try:
first = next(iterator)
except StopIteration:
return True
return all(first == x for x in iterator)
There are a number of alternative one-liners you might consider:
-
Converting the input to a set and checking that it only has one or zero (in case the input is empty) items
def all_equal2(iterator):
return len(set(iterator)) <= 1
-
Comparing against the input list without the first item
def all_equal3(lst):
return lst[:-1] == lst[1:]
-
Counting how many times the first item appears in the list
def all_equal_ivo(lst):
return not lst or lst.count(lst[0]) == len(lst)
-
Comparing against a list of the first element repeated
def all_equal_6502(lst):
return not lst or [lst[0]]*len(lst) == lst
But they have some downsides, namely:
all_equal
and all_equal2
can use any iterators, but the others must take a sequence input, typically concrete containers like a list or tuple.
all_equal
and all_equal3
stop as soon as a difference is found (what is called "short circuit"), whereas all the alternatives require iterating over the entire list, even if you can tell that the answer is False
just by looking at the first two elements.
- In
all_equal2
the content must be hashable. A list of lists will raise a TypeError
for example.
all_equal2
(in the worst case) and all_equal_6502
create a copy of the list, meaning you need to use double the memory.
On Python 3.9, using perfplot
, we get these timings (lower Runtime [s]
is better):
You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element.
if len(set(input_list)) == 1:
# input_list has all identical elements.
>>> a = [1, 2, 3, 4, 5, 6]
>>> z = [(a[x], a[x+1]) for x in range(0, len(a)-1)]
>>> z
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
# Replacing it with the test
>>> z = [(a[x] == a[x+1]) for x in range(0, len(a)-1)]
>>> z
[False, False, False, False, False]
>>> if False in z : Print "All elements are not equal"
This is a simple way of doing it:
result = mylist and all(mylist[0] == elem for elem in mylist)
This is slightly more complicated, it incurs function call overhead, but the semantics are more clearly spelled out:
def all_identical(seq):
if not seq:
# empty list is False.
return False
first = seq[0]
return all(first == elem for elem in seq)
This is another option, faster than len(set(x))==1
for long lists (uses short circuit)
def constantList(x):
return x and [x[0]]*len(x) == x
A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that’s trivial to check, and decide yourself what the outcome should be on an empty list)
x.count(x[0]) == len(x)
some simple benchmarks:
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777
def allTheSame(i):
j = itertools.groupby(i)
for k in j: break
for k in j: return False
return True
Works in Python 2.4, which doesn’t have “all”.
I’d do:
not any((x[i] != x[i+1] for i in range(0, len(x)-1)))
as any
stops searching the iterable as soon as it finds a True
condition.
[edit: This answer addresses the currently top-voted itertools.groupby
(which is a good answer) answer later on.]
Without rewriting the program, the most asymptotically performant and most readable way is as follows:
all(x==myList[0] for x in myList)
(Yes, this even works with the empty list! This is because this is one of the few cases where python has lazy semantics.)
This will fail at the earliest possible time, so it is asymptotically optimal (expected time is approximately O(#uniques) rather than O(N), but worst-case time still O(N)). This is assuming you have not seen the data before…
(If you care about performance but not that much about performance, you can just do the usual standard optimizations first, like hoisting the myList[0]
constant out of the loop and adding clunky logic for the edge case, though this is something the python compiler might eventually learn how to do and thus one should not do it unless absolutely necessary, as it destroys readability for minimal gain.)
If you care slightly more about performance, this is twice as fast as above but a bit more verbose:
def allEqual(iterable):
iterator = iter(iterable)
try:
firstItem = next(iterator)
except StopIteration:
return True
for x in iterator:
if x!=firstItem:
return False
return True
If you care even more about performance (but not enough to rewrite your program), use the currently top-voted itertools.groupby
answer, which is twice as fast as allEqual
because it is probably optimized C code. (According to the docs, it should (similar to this answer) not have any memory overhead because the lazy generator is never evaluated into a list… which one might be worried about, but the pseudocode shows that the grouped ‘lists’ are actually lazy generators.)
If you care even more about performance read on…
sidenotes regarding performance, because the other answers are talking about it for some unknown reason:
… if you have seen the data before and are likely using a collection data structure of some sort, and you really care about performance, you can get .isAllEqual()
for free O(1) by augmenting your structure with a Counter
that is updated with every insert/delete/etc. operation and just checking if it’s of the form {something:someCount}
i.e. len(counter.keys())==1
; alternatively you can keep a Counter on the side in a separate variable. This is provably better than anything else up to constant factor. Perhaps you can also use python’s FFI with ctypes
with your chosen method, and perhaps with a heuristic (like if it’s a sequence with getitem, then checking first element, last element, then elements in-order).
Of course, there’s something to be said for readability.
You can do:
reduce(and_, (x==yourList[0] for x in yourList), True)
It is fairly annoying that python makes you import the operators like operator.and_
. As of python3, you will need to also import functools.reduce
.
(You should not use this method because it will not break if it finds non-equal values, but will continue examining the entire list. It is just included here as an answer for completeness.)
If you’re interested in something a little more readable (but of course not as efficient,) you could try:
def compare_lists(list1, list2):
if len(list1) != len(list2): # Weed out unequal length lists.
return False
for item in list1:
if item not in list2:
return False
return True
a_list_1 = ['apple', 'orange', 'grape', 'pear']
a_list_2 = ['pear', 'orange', 'grape', 'apple']
b_list_1 = ['apple', 'orange', 'grape', 'pear']
b_list_2 = ['apple', 'orange', 'banana', 'pear']
c_list_1 = ['apple', 'orange', 'grape']
c_list_2 = ['grape', 'orange']
print compare_lists(a_list_1, a_list_2) # Returns True
print compare_lists(b_list_1, b_list_2) # Returns False
print compare_lists(c_list_1, c_list_2) # Returns False
lambda lst: reduce(lambda a,b:(b,b==a[0] and a[1]), lst, (lst[0], True))[1]
The next one will short short circuit:
all(itertools.imap(lambda i:yourlist[i]==yourlist[i+1], xrange(len(yourlist)-1)))
Regarding using reduce()
with lambda
. Here is a working code that I personally think is way nicer than some of the other answers.
reduce(lambda x, y: (x[1]==y, y), [2, 2, 2], (True, 2))
Returns a tuple where the first value is the boolean if all items are same or not.
For what it’s worth, this came up on the python-ideas mailing list recently. It turns out that there is an itertools recipe for doing this already:1
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
Supposedly it performs very nicely and has a few nice properties.
- Short-circuits: It will stop consuming items from the iterable as soon as it finds the first non-equal item.
- Doesn’t require items to be hashable.
- It is lazy and only requires O(1) additional memory to do the check.
1In other words, I can’t take the credit for coming up with the solution — nor can I take credit for even finding it.
Check if all elements equal to the first.
np.allclose(array, array[0])
Can use map and lambda
lst = [1,1,1,1,1,1,1,1,1]
print all(map(lambda x: x == lst[0], lst[1:]))
There is also a pure Python recursive option:
def checkEqual(lst):
if len(lst)==2 :
return lst[0]==lst[1]
else:
return lst[0]==lst[1] and checkEqual(lst[1:])
However for some reason it is in some cases two orders of magnitude slower than other options. Coming from C language mentality, I expected this to be faster, but it is not!
The other disadvantage is that there is recursion limit in Python which needs to be adjusted in this case. For example using this.
Or use diff
method of numpy:
import numpy as np
def allthesame(l):
return np.all(np.diff(l)==0)
And to call:
print(allthesame([1,1,1]))
Output:
True
Or use diff method of numpy:
import numpy as np
def allthesame(l):
return np.unique(l).shape[0]<=1
And to call:
print(allthesame([1,1,1]))
Output:
True
You can use .nunique()
to find number of unique items in a list.
def identical_elements(list):
series = pd.Series(list)
if series.nunique() == 1: identical = True
else: identical = False
return identical
identical_elements(['a', 'a'])
Out[427]: True
identical_elements(['a', 'b'])
Out[428]: False
The simple solution is to apply set on list
if all elements are identical len will be 1 else greater than 1
lst = [1,1,1,1,1,1,1,1,1]
len_lst = len(list(set(lst)))
print(len_lst)
1
lst = [1,2,1,1,1,1,1,1,1]
len_lst = len(list(set(lst)))
print(len_lst)
2
Maybe I’m underestimating the problem? Check the length of unique values in the list.
lzt = [1,1,1,1,1,2]
if (len(set(lzt)) > 1):
uniform = False
elif (len(set(lzt)) == 1):
uniform = True
elif (not lzt):
raise ValueError("List empty, get wrecked")
Here is a code with good amount of Pythonicity, and balance of simplicity and obviousness, I think, which should work also in pretty old Python versions.
def all_eq(lst):
for idx, itm in enumerate(lst):
if not idx: # == 0
prev = itm
if itm != prev:
return False
prev = itm
return True
This was a fun one to read through and think about. Thanks everyone!
I don’t think anything relying on pure count will be reliable for all cases. Also sum could work but only for numbers or length (again resulting in a count scenario).
But I do like the simplicity, so this is what I came up with:
all(i==lst[c-1] for c, i in enumerate(lst))
Alternatively, I do think this clever one by @kennytm would also work for all cases (and is probably the fastest, interestingly). So I concede that it’s probably better than mine:
[lst[0]]*len(lst) == lst
A little bonus clever one I think would also work because set gets rid of duplicates (and clever is fun but not generally best practice for maintaining code). And I think the one by @kennytm would still be faster but really only relevant for large lists:
len(set(lst)) == 1
But the simplicity and cleverness of Python is one of my favorite things about the language. And thinking about it a little more, if you have to modify the list in anyway, like I actually do because I’m comparing addresses (and will remove leading/trailing spaces and convert to lower case to remove possible inconsistencies, mine would be more suited for the job). So "better" is subjective as I eluded to by using quotes when I use that word! But you could also cleanup the list beforehand.
Best and good luck!
Best Answer
There was a nice Twitter thread on the various ways to implement an all_equal() function.
Given a list input, the best submission was:
t.count(t[0]) == len(t)
Other Approaches
Here is are the results from the thread:
-
Have groupby() compare adjacent entries. This has an early-out for a mismatch, does not use extra memory, and it runs at C speed.
g = itertools.groupby(s)
next(g, True) and not next(g, False)
-
Compare two slices offset from one another by one position. This uses extra memory but runs at C speed.
s[1:] == s[:-1]
-
Iterator version of slice comparison. It runs at C speed and does not use extra memory; however, the eq calls are expensive.
all(map(operator.eq, s, itertools.islice(s, 1, None)))
-
Compare the lowest and highest values. This runs at C speed, doesn’t use extra memory, but does cost two inequality tests per datum.
min(s) == max(s) # s must be non-empty
-
Build a set. This runs at C speed and uses little extra memory but requires hashability and does not have an early-out.
len(set(t))==1.
-
At great cost, this handles NaNs and other objects with exotic equality relations.
all(itertools.starmap(eq, itertools.product(s, repeat=2)))
-
Pull out the first element and compare all the others to it, stopping at the first mismatch. Only disadvantage is that this doesn’t run at C speed.
it = iter(s)
a = next(it, None)
return all(a == b for b in it)
-
Just count the first element. This is fast, simple, elegant. It runs at C speed, requires no additional memory, uses only equality tests, and makes only a single pass over the data.
t.count(t[0]) == len(t)
I ended up with this one-liner
from itertools import starmap, pairwise
all(starmap(eq, (pairwise(x)))
More versions using itertools.groupby
that I find clearer than the original (more about that below):
def all_equal(iterable):
g = groupby(iterable)
return not any(g) or not any(g)
def all_equal(iterable):
g = groupby(iterable)
next(g, None)
return not next(g, False)
def all_equal(iterable):
g = groupby(iterable)
return not next(g, False) or not next(g, False)
Here’s the original from the Itertools Recipes again:
def all_equal(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
Note that the next(g, True)
is always true (it’s either a non-empty tuple
or True
). That means its value doesn’t matter. It’s executed purely for advancing the groupby
iterator. But including it in the return
expression leads the reader into thinking that its value gets used there. Since it doesn’t, I find that misleading and unnecessarily complicated. My second version above treats the next(g, True)
as what it’s actually used for, as a statement whose value we don’t care about.
My third version goes a different direction and does use the value of the first next(g, False)
. If there isn’t even a first group at all (i.e., if the given iterable is "empty"), then that solution returns the result right away and doesn’t even check whether there’s a second group.
My first solution is basically the same as my third, just using any
. Both solutions read as "All elements are equal iff … there is no first group or there is no second group."
Benchmark results (although speed is really not my point here, clarity is, and in practice if there are many equal values, most of the time might be spent by the groupby
itself, reducing the impact of these differences here):
Python 3.10.4 on my Windows laptop:
iterable = ()
914 ns 914 ns 916 ns use_first_any
917 ns 925 ns 925 ns use_first_next
1074 ns 1075 ns 1075 ns next_as_statement
1081 ns 1083 ns 1084 ns original
iterable = (1,)
1290 ns 1290 ns 1291 ns next_as_statement
1303 ns 1307 ns 1307 ns use_first_next
1306 ns 1307 ns 1309 ns use_first_any
1318 ns 1319 ns 1320 ns original
iterable = (1, 2)
1463 ns 1464 ns 1467 ns use_first_any
1463 ns 1463 ns 1467 ns next_as_statement
1477 ns 1479 ns 1481 ns use_first_next
1487 ns 1489 ns 1492 ns original
Python 3.10.4 on a Debian Google Compute Engine instance:
iterable = ()
234 ns 234 ns 234 ns use_first_any
234 ns 235 ns 235 ns use_first_next
264 ns 264 ns 264 ns next_as_statement
265 ns 265 ns 265 ns original
iterable = (1,)
308 ns 308 ns 308 ns next_as_statement
315 ns 315 ns 315 ns original
316 ns 316 ns 317 ns use_first_any
317 ns 317 ns 317 ns use_first_next
iterable = (1, 2)
361 ns 361 ns 361 ns next_as_statement
367 ns 367 ns 367 ns original
384 ns 385 ns 385 ns use_first_next
386 ns 387 ns 387 ns use_first_any
Benchmark code:
from timeit import timeit
from random import shuffle
from bisect import insort
from itertools import groupby
def original(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
def use_first_any(iterable):
g = groupby(iterable)
return not any(g) or not any(g)
def next_as_statement(iterable):
g = groupby(iterable)
next(g, None)
return not next(g, False)
def use_first_next(iterable):
g = groupby(iterable)
return not next(g, False) or not next(g, False)
funcs = [original, use_first_any, next_as_statement, use_first_next]
for iterable in (), (1,), (1, 2):
print(f'{iterable = }')
times = {func: [] for func in funcs}
for _ in range(1000):
shuffle(funcs)
for func in funcs:
number = 1000
t = timeit(lambda: func(iterable), number=number) / number
insort(times[func], t)
for func in sorted(funcs, key=times.get):
print(*('%4d ns ' % round(t * 1e9) for t in times[func][:3]), func.__name__)
print()
I suggest a simple pythonic solution:
def all_equal_in_iterable(iterable: Iterable):
iterable = list(iterable)
if not iterable:
return False
return all(item == iterable[0] for item in iterable)
I need a function which takes in a list
and outputs True
if all elements in the input list evaluate as equal to each other using the standard equality operator and False
otherwise.
I feel it would be best to iterate through the list comparing adjacent elements and then AND
all the resulting Boolean values. But I’m not sure what’s the most Pythonic way to do that.
Doubt this is the “most Pythonic”, but something like:
>>> falseList = [1,2,3,4]
>>> trueList = [1, 1, 1]
>>>
>>> def testList(list):
... for item in list[1:]:
... if item != list[0]:
... return False
... return True
...
>>> testList(falseList)
False
>>> testList(trueList)
True
would do the trick.
Use itertools.groupby
(see the itertools
recipes):
from itertools import groupby
def all_equal(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
or without groupby
:
def all_equal(iterator):
iterator = iter(iterator)
try:
first = next(iterator)
except StopIteration:
return True
return all(first == x for x in iterator)
There are a number of alternative one-liners you might consider:
-
Converting the input to a set and checking that it only has one or zero (in case the input is empty) items
def all_equal2(iterator): return len(set(iterator)) <= 1
-
Comparing against the input list without the first item
def all_equal3(lst): return lst[:-1] == lst[1:]
-
Counting how many times the first item appears in the list
def all_equal_ivo(lst): return not lst or lst.count(lst[0]) == len(lst)
-
Comparing against a list of the first element repeated
def all_equal_6502(lst): return not lst or [lst[0]]*len(lst) == lst
But they have some downsides, namely:
all_equal
andall_equal2
can use any iterators, but the others must take a sequence input, typically concrete containers like a list or tuple.all_equal
andall_equal3
stop as soon as a difference is found (what is called "short circuit"), whereas all the alternatives require iterating over the entire list, even if you can tell that the answer isFalse
just by looking at the first two elements.- In
all_equal2
the content must be hashable. A list of lists will raise aTypeError
for example. all_equal2
(in the worst case) andall_equal_6502
create a copy of the list, meaning you need to use double the memory.
On Python 3.9, using perfplot
, we get these timings (lower Runtime [s]
is better):
You can convert the list to a set. A set cannot have duplicates. So if all the elements in the original list are identical, the set will have just one element.
if len(set(input_list)) == 1:
# input_list has all identical elements.
>>> a = [1, 2, 3, 4, 5, 6]
>>> z = [(a[x], a[x+1]) for x in range(0, len(a)-1)]
>>> z
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
# Replacing it with the test
>>> z = [(a[x] == a[x+1]) for x in range(0, len(a)-1)]
>>> z
[False, False, False, False, False]
>>> if False in z : Print "All elements are not equal"
This is a simple way of doing it:
result = mylist and all(mylist[0] == elem for elem in mylist)
This is slightly more complicated, it incurs function call overhead, but the semantics are more clearly spelled out:
def all_identical(seq):
if not seq:
# empty list is False.
return False
first = seq[0]
return all(first == elem for elem in seq)
This is another option, faster than len(set(x))==1
for long lists (uses short circuit)
def constantList(x):
return x and [x[0]]*len(x) == x
A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that’s trivial to check, and decide yourself what the outcome should be on an empty list)
x.count(x[0]) == len(x)
some simple benchmarks:
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777
def allTheSame(i):
j = itertools.groupby(i)
for k in j: break
for k in j: return False
return True
Works in Python 2.4, which doesn’t have “all”.
I’d do:
not any((x[i] != x[i+1] for i in range(0, len(x)-1)))
as any
stops searching the iterable as soon as it finds a True
condition.
[edit: This answer addresses the currently top-voted itertools.groupby
(which is a good answer) answer later on.]
Without rewriting the program, the most asymptotically performant and most readable way is as follows:
all(x==myList[0] for x in myList)
(Yes, this even works with the empty list! This is because this is one of the few cases where python has lazy semantics.)
This will fail at the earliest possible time, so it is asymptotically optimal (expected time is approximately O(#uniques) rather than O(N), but worst-case time still O(N)). This is assuming you have not seen the data before…
(If you care about performance but not that much about performance, you can just do the usual standard optimizations first, like hoisting the myList[0]
constant out of the loop and adding clunky logic for the edge case, though this is something the python compiler might eventually learn how to do and thus one should not do it unless absolutely necessary, as it destroys readability for minimal gain.)
If you care slightly more about performance, this is twice as fast as above but a bit more verbose:
def allEqual(iterable):
iterator = iter(iterable)
try:
firstItem = next(iterator)
except StopIteration:
return True
for x in iterator:
if x!=firstItem:
return False
return True
If you care even more about performance (but not enough to rewrite your program), use the currently top-voted itertools.groupby
answer, which is twice as fast as allEqual
because it is probably optimized C code. (According to the docs, it should (similar to this answer) not have any memory overhead because the lazy generator is never evaluated into a list… which one might be worried about, but the pseudocode shows that the grouped ‘lists’ are actually lazy generators.)
If you care even more about performance read on…
sidenotes regarding performance, because the other answers are talking about it for some unknown reason:
… if you have seen the data before and are likely using a collection data structure of some sort, and you really care about performance, you can get .isAllEqual()
for free O(1) by augmenting your structure with a Counter
that is updated with every insert/delete/etc. operation and just checking if it’s of the form {something:someCount}
i.e. len(counter.keys())==1
; alternatively you can keep a Counter on the side in a separate variable. This is provably better than anything else up to constant factor. Perhaps you can also use python’s FFI with ctypes
with your chosen method, and perhaps with a heuristic (like if it’s a sequence with getitem, then checking first element, last element, then elements in-order).
Of course, there’s something to be said for readability.
You can do:
reduce(and_, (x==yourList[0] for x in yourList), True)
It is fairly annoying that python makes you import the operators like operator.and_
. As of python3, you will need to also import functools.reduce
.
(You should not use this method because it will not break if it finds non-equal values, but will continue examining the entire list. It is just included here as an answer for completeness.)
If you’re interested in something a little more readable (but of course not as efficient,) you could try:
def compare_lists(list1, list2):
if len(list1) != len(list2): # Weed out unequal length lists.
return False
for item in list1:
if item not in list2:
return False
return True
a_list_1 = ['apple', 'orange', 'grape', 'pear']
a_list_2 = ['pear', 'orange', 'grape', 'apple']
b_list_1 = ['apple', 'orange', 'grape', 'pear']
b_list_2 = ['apple', 'orange', 'banana', 'pear']
c_list_1 = ['apple', 'orange', 'grape']
c_list_2 = ['grape', 'orange']
print compare_lists(a_list_1, a_list_2) # Returns True
print compare_lists(b_list_1, b_list_2) # Returns False
print compare_lists(c_list_1, c_list_2) # Returns False
lambda lst: reduce(lambda a,b:(b,b==a[0] and a[1]), lst, (lst[0], True))[1]
The next one will short short circuit:
all(itertools.imap(lambda i:yourlist[i]==yourlist[i+1], xrange(len(yourlist)-1)))
Regarding using reduce()
with lambda
. Here is a working code that I personally think is way nicer than some of the other answers.
reduce(lambda x, y: (x[1]==y, y), [2, 2, 2], (True, 2))
Returns a tuple where the first value is the boolean if all items are same or not.
For what it’s worth, this came up on the python-ideas mailing list recently. It turns out that there is an itertools recipe for doing this already:1
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
Supposedly it performs very nicely and has a few nice properties.
- Short-circuits: It will stop consuming items from the iterable as soon as it finds the first non-equal item.
- Doesn’t require items to be hashable.
- It is lazy and only requires O(1) additional memory to do the check.
1In other words, I can’t take the credit for coming up with the solution — nor can I take credit for even finding it.
Check if all elements equal to the first.
np.allclose(array, array[0])
Can use map and lambda
lst = [1,1,1,1,1,1,1,1,1]
print all(map(lambda x: x == lst[0], lst[1:]))
There is also a pure Python recursive option:
def checkEqual(lst):
if len(lst)==2 :
return lst[0]==lst[1]
else:
return lst[0]==lst[1] and checkEqual(lst[1:])
However for some reason it is in some cases two orders of magnitude slower than other options. Coming from C language mentality, I expected this to be faster, but it is not!
The other disadvantage is that there is recursion limit in Python which needs to be adjusted in this case. For example using this.
Or use diff
method of numpy:
import numpy as np
def allthesame(l):
return np.all(np.diff(l)==0)
And to call:
print(allthesame([1,1,1]))
Output:
True
Or use diff method of numpy:
import numpy as np
def allthesame(l):
return np.unique(l).shape[0]<=1
And to call:
print(allthesame([1,1,1]))
Output:
True
You can use .nunique()
to find number of unique items in a list.
def identical_elements(list):
series = pd.Series(list)
if series.nunique() == 1: identical = True
else: identical = False
return identical
identical_elements(['a', 'a'])
Out[427]: True
identical_elements(['a', 'b'])
Out[428]: False
The simple solution is to apply set on list
if all elements are identical len will be 1 else greater than 1
lst = [1,1,1,1,1,1,1,1,1]
len_lst = len(list(set(lst)))
print(len_lst)
1
lst = [1,2,1,1,1,1,1,1,1]
len_lst = len(list(set(lst)))
print(len_lst)
2
Maybe I’m underestimating the problem? Check the length of unique values in the list.
lzt = [1,1,1,1,1,2]
if (len(set(lzt)) > 1):
uniform = False
elif (len(set(lzt)) == 1):
uniform = True
elif (not lzt):
raise ValueError("List empty, get wrecked")
Here is a code with good amount of Pythonicity, and balance of simplicity and obviousness, I think, which should work also in pretty old Python versions.
def all_eq(lst):
for idx, itm in enumerate(lst):
if not idx: # == 0
prev = itm
if itm != prev:
return False
prev = itm
return True
This was a fun one to read through and think about. Thanks everyone!
I don’t think anything relying on pure count will be reliable for all cases. Also sum could work but only for numbers or length (again resulting in a count scenario).
But I do like the simplicity, so this is what I came up with:
all(i==lst[c-1] for c, i in enumerate(lst))
Alternatively, I do think this clever one by @kennytm would also work for all cases (and is probably the fastest, interestingly). So I concede that it’s probably better than mine:
[lst[0]]*len(lst) == lst
A little bonus clever one I think would also work because set gets rid of duplicates (and clever is fun but not generally best practice for maintaining code). And I think the one by @kennytm would still be faster but really only relevant for large lists:
len(set(lst)) == 1
But the simplicity and cleverness of Python is one of my favorite things about the language. And thinking about it a little more, if you have to modify the list in anyway, like I actually do because I’m comparing addresses (and will remove leading/trailing spaces and convert to lower case to remove possible inconsistencies, mine would be more suited for the job). So "better" is subjective as I eluded to by using quotes when I use that word! But you could also cleanup the list beforehand.
Best and good luck!
Best Answer
There was a nice Twitter thread on the various ways to implement an all_equal() function.
Given a list input, the best submission was:
t.count(t[0]) == len(t)
Other Approaches
Here is are the results from the thread:
-
Have groupby() compare adjacent entries. This has an early-out for a mismatch, does not use extra memory, and it runs at C speed.
g = itertools.groupby(s) next(g, True) and not next(g, False)
-
Compare two slices offset from one another by one position. This uses extra memory but runs at C speed.
s[1:] == s[:-1]
-
Iterator version of slice comparison. It runs at C speed and does not use extra memory; however, the eq calls are expensive.
all(map(operator.eq, s, itertools.islice(s, 1, None)))
-
Compare the lowest and highest values. This runs at C speed, doesn’t use extra memory, but does cost two inequality tests per datum.
min(s) == max(s) # s must be non-empty
-
Build a set. This runs at C speed and uses little extra memory but requires hashability and does not have an early-out.
len(set(t))==1.
-
At great cost, this handles NaNs and other objects with exotic equality relations.
all(itertools.starmap(eq, itertools.product(s, repeat=2)))
-
Pull out the first element and compare all the others to it, stopping at the first mismatch. Only disadvantage is that this doesn’t run at C speed.
it = iter(s) a = next(it, None) return all(a == b for b in it)
-
Just count the first element. This is fast, simple, elegant. It runs at C speed, requires no additional memory, uses only equality tests, and makes only a single pass over the data.
t.count(t[0]) == len(t)
I ended up with this one-liner
from itertools import starmap, pairwise
all(starmap(eq, (pairwise(x)))
More versions using itertools.groupby
that I find clearer than the original (more about that below):
def all_equal(iterable):
g = groupby(iterable)
return not any(g) or not any(g)
def all_equal(iterable):
g = groupby(iterable)
next(g, None)
return not next(g, False)
def all_equal(iterable):
g = groupby(iterable)
return not next(g, False) or not next(g, False)
Here’s the original from the Itertools Recipes again:
def all_equal(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
Note that the next(g, True)
is always true (it’s either a non-empty tuple
or True
). That means its value doesn’t matter. It’s executed purely for advancing the groupby
iterator. But including it in the return
expression leads the reader into thinking that its value gets used there. Since it doesn’t, I find that misleading and unnecessarily complicated. My second version above treats the next(g, True)
as what it’s actually used for, as a statement whose value we don’t care about.
My third version goes a different direction and does use the value of the first next(g, False)
. If there isn’t even a first group at all (i.e., if the given iterable is "empty"), then that solution returns the result right away and doesn’t even check whether there’s a second group.
My first solution is basically the same as my third, just using any
. Both solutions read as "All elements are equal iff … there is no first group or there is no second group."
Benchmark results (although speed is really not my point here, clarity is, and in practice if there are many equal values, most of the time might be spent by the groupby
itself, reducing the impact of these differences here):
Python 3.10.4 on my Windows laptop:
iterable = ()
914 ns 914 ns 916 ns use_first_any
917 ns 925 ns 925 ns use_first_next
1074 ns 1075 ns 1075 ns next_as_statement
1081 ns 1083 ns 1084 ns original
iterable = (1,)
1290 ns 1290 ns 1291 ns next_as_statement
1303 ns 1307 ns 1307 ns use_first_next
1306 ns 1307 ns 1309 ns use_first_any
1318 ns 1319 ns 1320 ns original
iterable = (1, 2)
1463 ns 1464 ns 1467 ns use_first_any
1463 ns 1463 ns 1467 ns next_as_statement
1477 ns 1479 ns 1481 ns use_first_next
1487 ns 1489 ns 1492 ns original
Python 3.10.4 on a Debian Google Compute Engine instance:
iterable = ()
234 ns 234 ns 234 ns use_first_any
234 ns 235 ns 235 ns use_first_next
264 ns 264 ns 264 ns next_as_statement
265 ns 265 ns 265 ns original
iterable = (1,)
308 ns 308 ns 308 ns next_as_statement
315 ns 315 ns 315 ns original
316 ns 316 ns 317 ns use_first_any
317 ns 317 ns 317 ns use_first_next
iterable = (1, 2)
361 ns 361 ns 361 ns next_as_statement
367 ns 367 ns 367 ns original
384 ns 385 ns 385 ns use_first_next
386 ns 387 ns 387 ns use_first_any
Benchmark code:
from timeit import timeit
from random import shuffle
from bisect import insort
from itertools import groupby
def original(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
def use_first_any(iterable):
g = groupby(iterable)
return not any(g) or not any(g)
def next_as_statement(iterable):
g = groupby(iterable)
next(g, None)
return not next(g, False)
def use_first_next(iterable):
g = groupby(iterable)
return not next(g, False) or not next(g, False)
funcs = [original, use_first_any, next_as_statement, use_first_next]
for iterable in (), (1,), (1, 2):
print(f'{iterable = }')
times = {func: [] for func in funcs}
for _ in range(1000):
shuffle(funcs)
for func in funcs:
number = 1000
t = timeit(lambda: func(iterable), number=number) / number
insort(times[func], t)
for func in sorted(funcs, key=times.get):
print(*('%4d ns ' % round(t * 1e9) for t in times[func][:3]), func.__name__)
print()
I suggest a simple pythonic solution:
def all_equal_in_iterable(iterable: Iterable):
iterable = list(iterable)
if not iterable:
return False
return all(item == iterable[0] for item in iterable)