Most efficient way to filter prime numbers from a list of random numbers in Python
Question:
I have a list filled with random numbers and I want to return the prime numbers from this list. So I created these functions:
def is_prime(number):
for i in range(2, int(sqrt(number)) + 1):
if number % i == 0:
return False
return number > 1
And
def filter_primes(general_list):
return set(filter(is_prime, general_list))
But I want to improve performance, so how can I achieve this?
Answers:
How about this? I think It’s a little better:
def filter_primes(general_list):
return filter(is_prime, set(general_list))
This way we don’t call is_prime()
for same number multiple times.
-
The Sieve of Eratosthenes is more efficient than Trial Division, the method you are using.
-
Your trial division loop can be made more efficient, taking about half the time. Two is the only even prime number, so treat two as a special case and only deal with odd numbers thereafter, which will halve the work.
My Python is non-existent, but this pseudocode should make things clear:
def isPrime(num)
// Low numbers.
if (num <= 1)
return false
end if
// Even numbers
if (num % 2 == 0)
return (num == 2) // 2 is the only even prime.
end if
// Odd numbers
for (i = 3 to sqrt(num) + 1 step 2)
if (num % i == 0)
return false
end if
end for
// If we reach here, num is prime.
return true;
end def
That step 2
in the for
loop is what halves the work. Having earlier eliminated all even numbers you only need to test with odd trial divisors: 3, 5, 7, …
Sieve of Eratosthenes, taking about 0.17 seconds for primes under 10 million on PyPy 3.5 on my device:
from array import array
from math import isqrt
def primes(upper):
numbers = array('B', [1]) * (upper + 1)
for i in range(2, isqrt(upper) + 1):
if numbers[i]:
low_multiple = i * i
numbers[low_multiple:upper + 1:i] = array('B', [0]) * ((upper - low_multiple) // i + 1)
return {i for i, x in enumerate(numbers) if i >= 2 and x}
and the filter function:
filter_primes = primes(10_000_000).intersection
3 rounds of the the Miller-Rabin test ( https://en.wikipedia.org/wiki/Miller%2dRabin_primality_test ) using bases 2, 7, and 61, is known to accurately detect all primes <= 32 bit, i.e., anything that fits into a python int
.
This is much, much faster than trial division or sieving if the numbers can be large.
If the numbers cannot be large (i.e., < 10,000,000 as you suggest in comments), then you may want to precompute the set of all primes < 10,000,000, but there are over 600,000 of those.
def primes_list(num_list):
divs = [2,3,5,7]
primes = [x for x in set(num_list) if 0 not in {1 if ((x%i != 0) | (x in divs)) & (x > 0) else 0 for i in divs}]
return primes
For this function, it takes a list, num_list
, as a parameter. divs
is a predefined, or rather hard coded, list of prime numbers less than 10 excluding 1. Then we use list comprehension to filter num_list
for prime numbers as the variable primes
.
This is one more flavour of code to find the prime no of the range. The simple and easy way.
def find_prime(n):
if n <=1:
return False
else:
for i in range(2, n):
if n % i == 0:
return False
return True
n = 10
x = filter(find_prime, range(n)) #you can give random no list too
print(list(x))
def is_prime(n):
if n>1:
for i in range(2,int(n**0.5)+1):
if n%i==0:
return False
return True
else: return False
print([x for x in general_list if is_prime(x)])
would you try this… There is no need for the filter at all and you could simply apply set() to the general_list if there are duplicate elements in the list to optimize more.
I have a list filled with random numbers and I want to return the prime numbers from this list. So I created these functions:
def is_prime(number):
for i in range(2, int(sqrt(number)) + 1):
if number % i == 0:
return False
return number > 1
And
def filter_primes(general_list):
return set(filter(is_prime, general_list))
But I want to improve performance, so how can I achieve this?
How about this? I think It’s a little better:
def filter_primes(general_list):
return filter(is_prime, set(general_list))
This way we don’t call is_prime()
for same number multiple times.
-
The Sieve of Eratosthenes is more efficient than Trial Division, the method you are using.
-
Your trial division loop can be made more efficient, taking about half the time. Two is the only even prime number, so treat two as a special case and only deal with odd numbers thereafter, which will halve the work.
My Python is non-existent, but this pseudocode should make things clear:
def isPrime(num)
// Low numbers.
if (num <= 1)
return false
end if
// Even numbers
if (num % 2 == 0)
return (num == 2) // 2 is the only even prime.
end if
// Odd numbers
for (i = 3 to sqrt(num) + 1 step 2)
if (num % i == 0)
return false
end if
end for
// If we reach here, num is prime.
return true;
end def
That step 2
in the for
loop is what halves the work. Having earlier eliminated all even numbers you only need to test with odd trial divisors: 3, 5, 7, …
Sieve of Eratosthenes, taking about 0.17 seconds for primes under 10 million on PyPy 3.5 on my device:
from array import array
from math import isqrt
def primes(upper):
numbers = array('B', [1]) * (upper + 1)
for i in range(2, isqrt(upper) + 1):
if numbers[i]:
low_multiple = i * i
numbers[low_multiple:upper + 1:i] = array('B', [0]) * ((upper - low_multiple) // i + 1)
return {i for i, x in enumerate(numbers) if i >= 2 and x}
and the filter function:
filter_primes = primes(10_000_000).intersection
3 rounds of the the Miller-Rabin test ( https://en.wikipedia.org/wiki/Miller%2dRabin_primality_test ) using bases 2, 7, and 61, is known to accurately detect all primes <= 32 bit, i.e., anything that fits into a python int
.
This is much, much faster than trial division or sieving if the numbers can be large.
If the numbers cannot be large (i.e., < 10,000,000 as you suggest in comments), then you may want to precompute the set of all primes < 10,000,000, but there are over 600,000 of those.
def primes_list(num_list):
divs = [2,3,5,7]
primes = [x for x in set(num_list) if 0 not in {1 if ((x%i != 0) | (x in divs)) & (x > 0) else 0 for i in divs}]
return primes
For this function, it takes a list, num_list
, as a parameter. divs
is a predefined, or rather hard coded, list of prime numbers less than 10 excluding 1. Then we use list comprehension to filter num_list
for prime numbers as the variable primes
.
This is one more flavour of code to find the prime no of the range. The simple and easy way.
def find_prime(n):
if n <=1:
return False
else:
for i in range(2, n):
if n % i == 0:
return False
return True
n = 10
x = filter(find_prime, range(n)) #you can give random no list too
print(list(x))
def is_prime(n):
if n>1:
for i in range(2,int(n**0.5)+1):
if n%i==0:
return False
return True
else: return False
print([x for x in general_list if is_prime(x)])
would you try this… There is no need for the filter at all and you could simply apply set() to the general_list if there are duplicate elements in the list to optimize more.