Factorizing a number in Python

Question:

Here’s my code:

def factorize(n):
    sieve = [True] * (n + 1)

    for x in range(2, int(len(sieve) ** 0.5) + 1):
        if sieve[x]: 
            for i in range(x + x, len(sieve), x):
                sieve[i] = False

    lowerPrimes = i for i in range(2, len(sieve)) if sieve[i]] and (n % i == 0)]
    return lowerPrimes

factorize(n) returns all prime factors of the given value n. As you can see, it first makes an Eratosthenes sieve for n and then uses a list comprehension to return all values in the sieve that are factors of n. It works relatively fine for this purpose, however, I want it to return a list so that if you multiply every item in it, the result is n. Do you get my point?

For example, factorize(99020) returns [2, 5, 4951], but I’d like it to return [2, 2, 5, 4951], as 2*2*5*4951 = 99020.

I know my approach is not even close, but could you help me to make it so?

Asked By: Juan José Castro

||

Answers:

I’m not familiar enough to tell if this question should be deleted or whatever, but I’ll help out anyhow.

You have the sieve part correct, I think.
The key is to use a while loop to keep on dividing out a valid prime factor more than once.

factors = []
sieve[0] = sieve[1] = False # So I don't have to worry about trying to skip over these two
for testFactIndex, isPrime in enumerate(sieve):
    if isPrime:
        while n%testFactIndex == 0:
            n = n/testFactIndex
            factors.append(testFactIndex)
return factors
Answered By: SimonT

The Sieve of Eratosthenes helps you find prime numbers below a certain limit. It’s not really going to help you with finding the factors of a particular number.

If you want to do that, the simplest approach that I can see is something like this:

def factors(n):
    while n > 1:
        for i in range(2, n + 1):
            if n % i == 0:
                n //= i
                yield i
                break

for factor in factors(360):
    print factor

This basically finds the smallest factor of n (which is guaranteed to be prime), divides n by that number and repeats the process until n is equal to 1.

The output is:

2
2
2
3
3
5

They multiply out to the original number:

>>> from operator import mul
>>> reduce(mul, factors(360))
360
Answered By: Blender

The code in the answer by Blender is very nice, but that algorithm is lacking in one very important respect: it tests way too much. E.g. trying to factorize n=392798360393, which is a prime number, it will try to divide it by all the numbers below it (including itself). This will take a lot of time.

Is it really necessary? If n == a*b and a < b, having found a we don’t really need to test divide n by b. We know that it too divides n, because n/a == b implies n/b == a. So we only need to test while the potential factor a is smaller than (or equal to) the potential factor b. That is to say, until the square root of the number is reached.

Also, instead of starting over from 2 for each reduced n it’s OK to start from the previous value of i:

from math import sqrt
def factors(n):    # (cf. https://stackoverflow.com/a/15703327/849891)
    j = 2
    while n > 1:
        for i in range(j, int(sqrt(n+0.05)) + 1):
            if n % i == 0:
                n //= i ; j = i
                yield i
                break
        else:
            if n > 1:
                yield n; break

Indeed, factoring 9000009 by this code takes 0.08 seconds on Ideone, instead of 0.59 seconds.

This is guaranteed to produce only primes (because we divide out each factor found, and we try the candidates in non-decreasing order). The overhead of generating primes first and then testing only by the primes (a.o.t. testing by all numbers, as done here above) will be worth it if we are factoring several numbers at once; for factoring just one number it might not be worth it, depending on the speed of your prime generation.


But then, what really should be done when factoring several numbers at once, is to create the smallest factor sieve first where we mark each number in a given range by its smallest (prime) factor (from which it was generated by the sieve) instead of just by True or False as in the sieve of Eratosthenes. This smallest factor sieve is then used for efficient factorization of each of the numbers given, in the same range, by successive division by their factors, from the smallest up, which are efficiently found by a direct look-up in the sieve instead of testing anew:

def sfs_factorize(nums):
    top = max(nums)
    sv = smallest_factor_sieve(top)
    for k in nums:
        fs = [] ; n = k
        while n > 1:
            f = sv[n]    # no testing
            n //= f
            fs.append(f)
        print( k, list(fs))
Answered By: Will Ness

My one-liner with python 3.8 (assignment expressions used)

f = lambda n: (p:=[next(i for i in range(2, n+1) if n % i == 0)] if n>1 else [])+(f(n//p[0]) if p else [])

video with details – Factorizing number in Python

Answered By: Evg

Your first prime-sieving part is totally alright. Only second factors-finding part should be corrected.

In second part you should divide by prime factor not just single time, but multiple times till it is still divisible. This will ensure that you take into account powers of all prime factors.

This second part is called trial division by prime numbers. As it is well known it is enough to do trial division only by divisors below Sqrt(N). Remaining value of N will be automatically also prime.

One more very important speed improvement to your algorithm is that it is enough to make sieve size Sqrt(N), and not N. Because we need only primes that are less than Sqrt(N). This is because after trial division remaining N is automatically becomes prime.

I also found one more improvement for your sieving part, you should replace for i in range(x + x, ... with for i in range(x * x, .... Because you can start sieving not from x + x but from x * x, according to classical algorithm of Sieve of Erathosthenes. This will further improve speed. Starting from x + x is alright, but x * x will also give correct results with better performance.

One more obvious improvement is when you want to factorize multiple numbers, then you can reuse and extend sieve multiple times, this will make code faster. When reusing sieve it is good to keep resulting prime numbers in a separate list, so that on second Trial Division stage you don’t iterate whole sieve but only prime numbers, this again will give speedup. I didn’t do these two improvements in code below, but if you want I can do, just tell.

Full corrected final version is:

Try it online!

def factorize(n):
    sieve = [True] * int(n ** 0.5 + 2)
    
    for x in range(2, int(len(sieve) ** 0.5 + 2)):
        if not sieve[x]: 
            continue
        for i in range(x * x, len(sieve), x):
            sieve[i] = False

    factors = []
    for i in range(2, len(sieve)):
        if i * i > n:
            break
        if not sieve[i]:
            continue
        while n % i == 0:
            factors.append(i)
            n //= i
    if n > 1:
        factors.append(n)
    return factors

print(factorize(99020)) # [2, 2, 5, 4951]

Input:

99020

Output:

[2, 2, 5, 4951]

If you need to factorize just few numbers it might be even faster to do Trial Division Factorization without intermediate sieving stage. This also makes code more simple and shorter.

Full code for doing factorization without sieving stage is below.

In this code I did one obvious speed improvement by trying only odd divisors, this will double total algorithm speed. Of course for doing this you need to factor out all powers of 2 before, which I did too.

Try it online!

def factorize(n):
    factors = []
    while (n & 1) == 0:
        factors.append(2)
        n >>= 1
    for d in range(3, 1 << 60, 2):
        if d * d > n:
            break
        while n % d == 0:
            factors.append(d)
            n //= d
    if n > 1:
        factors.append(n)
    return factors

print(factorize(99020)) # [2, 2, 5, 4951]
Answered By: Arty