number of values in a list greater than a certain number
Question:
I have a list of numbers and I want to get the number of times a number appears in a list that meets a certain criteria. I can use a list comprehension (or a list comprehension in a function) but I am wondering if someone has a shorter way.
# list of numbers
j=[4,5,6,7,1,3,7,5]
#list comprehension of values of j > 5
x = [i for i in j if i>5]
#value of x
len(x)
#or function version
def length_of_list(list_of_numbers, number):
x = [i for i in list_of_numbers if j > number]
return len(x)
length_of_list(j, 5)
is there an even more condensed version?
Answers:
You could do something like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> sum(i > 5 for i in j)
3
It might initially seem strange to add True
to True
this way, but I don’t think it’s unpythonic; after all, bool
is a subclass of int
in all versions since 2.3:
>>> issubclass(bool, int)
True
if you are otherwise using numpy, you can save a few strokes, but i dont think it gets much faster/compact than senderle’s answer.
import numpy as np
j = np.array(j)
sum(j > i)
You can create a smaller intermediate result like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> len([1 for i in j if i > 5])
3
A (somewhat) different way:
reduce(lambda acc, x: acc + (1 if x > 5 else 0), j, 0)
If you are using NumPy (as in ludaavic’s answer), for large arrays you’ll probably want to use NumPy’s sum
function rather than Python’s builtin sum
for a significant speedup — e.g., a >1000x speedup for 10 million element arrays on my laptop:
>>> import numpy as np
>>> ten_million = 10 * 1000 * 1000
>>> x, y = (np.random.randn(ten_million) for _ in range(2))
>>> %timeit sum(x > y) # time Python builtin sum function
1 loops, best of 3: 24.3 s per loop
>>> %timeit (x > y).sum() # wow, that was really slow! time NumPy sum method
10 loops, best of 3: 18.7 ms per loop
>>> %timeit np.sum(x > y) # time NumPy sum function
10 loops, best of 3: 18.8 ms per loop
(above uses IPython’s %timeit
“magic” for timing)
Different way of counting by using bisect module:
>>> from bisect import bisect
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> j.sort()
>>> b = 5
>>> index = bisect(j,b) #Find that index value
>>> print len(j)-index
3
I’ll add a map and filter version because why not.
sum(map(lambda x:x>5, j))
sum(1 for _ in filter(lambda x:x>5, j))
You can do like this using function:
l = [34,56,78,2,3,5,6,8,45,6]
print ("The list : " + str(l))
def count_greater30(l):
count = 0
for i in l:
if i > 30:
count = count + 1.
return count
print("Count greater than 30 is : " + str(count)).
count_greater30(l)
This is a little bit longer but the detailed solution for beginners:
from functools import reduce
from statistics import mean
two_dim_array = [[1, 5, 7, 3, 2], [2, 4 ,1 ,6, 8]]
# convert two dimensional array to one dimensional array
one_dim_array = reduce(list.__add__, two_dim_array)
arithmetic_mean = mean(one_dim_array)
exceeding_count = sum(i > arithmetic_mean for i in one_dim_array)
I have a list of numbers and I want to get the number of times a number appears in a list that meets a certain criteria. I can use a list comprehension (or a list comprehension in a function) but I am wondering if someone has a shorter way.
# list of numbers
j=[4,5,6,7,1,3,7,5]
#list comprehension of values of j > 5
x = [i for i in j if i>5]
#value of x
len(x)
#or function version
def length_of_list(list_of_numbers, number):
x = [i for i in list_of_numbers if j > number]
return len(x)
length_of_list(j, 5)
is there an even more condensed version?
You could do something like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> sum(i > 5 for i in j)
3
It might initially seem strange to add True
to True
this way, but I don’t think it’s unpythonic; after all, bool
is a subclass of int
in all versions since 2.3:
>>> issubclass(bool, int)
True
if you are otherwise using numpy, you can save a few strokes, but i dont think it gets much faster/compact than senderle’s answer.
import numpy as np
j = np.array(j)
sum(j > i)
You can create a smaller intermediate result like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> len([1 for i in j if i > 5])
3
A (somewhat) different way:
reduce(lambda acc, x: acc + (1 if x > 5 else 0), j, 0)
If you are using NumPy (as in ludaavic’s answer), for large arrays you’ll probably want to use NumPy’s sum
function rather than Python’s builtin sum
for a significant speedup — e.g., a >1000x speedup for 10 million element arrays on my laptop:
>>> import numpy as np
>>> ten_million = 10 * 1000 * 1000
>>> x, y = (np.random.randn(ten_million) for _ in range(2))
>>> %timeit sum(x > y) # time Python builtin sum function
1 loops, best of 3: 24.3 s per loop
>>> %timeit (x > y).sum() # wow, that was really slow! time NumPy sum method
10 loops, best of 3: 18.7 ms per loop
>>> %timeit np.sum(x > y) # time NumPy sum function
10 loops, best of 3: 18.8 ms per loop
(above uses IPython’s %timeit
“magic” for timing)
Different way of counting by using bisect module:
>>> from bisect import bisect
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> j.sort()
>>> b = 5
>>> index = bisect(j,b) #Find that index value
>>> print len(j)-index
3
I’ll add a map and filter version because why not.
sum(map(lambda x:x>5, j))
sum(1 for _ in filter(lambda x:x>5, j))
You can do like this using function:
l = [34,56,78,2,3,5,6,8,45,6]
print ("The list : " + str(l))
def count_greater30(l):
count = 0
for i in l:
if i > 30:
count = count + 1.
return count
print("Count greater than 30 is : " + str(count)).
count_greater30(l)
This is a little bit longer but the detailed solution for beginners:
from functools import reduce
from statistics import mean
two_dim_array = [[1, 5, 7, 3, 2], [2, 4 ,1 ,6, 8]]
# convert two dimensional array to one dimensional array
one_dim_array = reduce(list.__add__, two_dim_array)
arithmetic_mean = mean(one_dim_array)
exceeding_count = sum(i > arithmetic_mean for i in one_dim_array)