Is there anyway to optimise this program further?

Question

I’m writing this program to calculate a basic equation involving factorials, which is being tested on an online server. The equation is (1/n!) * (1!+ 2!+ 3!…n!). The program runs fine and gets the job done, however for testcases where n>1000 it runs slowly and causes the server to time out. Is there anyways i can optimise this program to make it run quicker for the larger test cases? the result needs to be truncated to 6 decimal places hence the stuff going on with math.floor().

import math

def fact(n):
    return math.factorial(n)

def going(n):
    return math.floor(((sum(fact(i) for i in range(1, n+1)))/ fact(n) * 1000000))/ 1000000

Asked By: Firas Attieh

||

Source

Answer 1

It’ll be fun to look at the code execution in different levels of optimization 🙂
The first case – unoptimized code (running inputs 1000, 3000, …, and down)

from functools import wraps
import time
import math


def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper

def my_fact(n):
    return math.factorial(n)

@timeit
def going(n):
    return math.floor(((sum(my_fact(i) for i in range(1, n+1)))/ my_fact(n) * 1000000))/ 1000000

def main():
    for i in range(1000, 20000, 2000):
        going(i)
    for i in range(20000, 1000, -2000):
        going(i)

if __name__ == '__main__':
    main()

Execution-

Function going(1000,) {} Took 0.0193 seconds
Function going(3000,) {} Took 0.3540 seconds
Function going(5000,) {} Took 1.5023 seconds
Function going(7000,) {} Took 3.8886 seconds
Function going(9000,) {} Took 7.7359 seconds
Function going(11000,) {} Took 13.7363 seconds
Function going(13000,) {} Took 21.8165 seconds
Function going(15000,) {} Took 32.3038 seconds
Function going(17000,) {} Took 44.8729 seconds
Function going(19000,) {} Took 61.6663 seconds
Function going(20000,) {} Took 70.6783 seconds
Function going(18000,) {} Took 52.4495 seconds
Function going(16000,) {} Took 37.3437 seconds
Function going(14000,) {} Took 25.7586 seconds
Function going(12000,) {} Took 17.7381 seconds
Function going(10000,) {} Took 10.2902 seconds
Function going(8000,) {} Took 5.6387 seconds
Function going(6000,) {} Took 2.4552 seconds
Function going(4000,) {} Took 0.7788 seconds
Function going(2000,) {} Took 0.1096 seconds

Given the fact that the code uses the factorial function A LOT, it’s probably best to optimize it’s execution time.
An easy and quick optimization would be to cache to the factorial function (the results will be saved and reused in the future). This will work quite well since you run it on all inputs, so reruns will be much faster, like in this code-

from functools import wraps
import functools
import time
import math


def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper

@functools.cache
def my_fact(n):
    return math.factorial(n)

@timeit
def going(n):
    return math.floor(((sum(my_fact(i) for i in range(1, n+1)))/ my_fact(n) * 1000000))/ 1000000

def main():
    for i in range(1000, 20000, 2000):
        going(i)
    for i in range(20000, 1000, -2000):
        going(i)

if __name__ == '__main__':
    main()

Execution-

Function going(1000,) {} Took 0.0208 seconds
Function going(3000,) {} Took 0.3680 seconds
Function going(5000,) {} Took 1.1571 seconds
Function going(7000,) {} Took 2.3572 seconds
Function going(9000,) {} Took 4.1111 seconds
Function going(11000,) {} Took 5.9934 seconds
Function going(13000,) {} Took 8.4503 seconds
Function going(15000,) {} Took 10.4227 seconds
Function going(17000,) {} Took 13.3260 seconds
Function going(19000,) {} Took 16.8218 seconds
Function going(20000,) {} Took 9.8875 seconds
Function going(18000,) {} Took 0.1035 seconds
Function going(16000,) {} Took 0.0705 seconds
Function going(14000,) {} Took 0.0464 seconds
Function going(12000,) {} Took 0.0266 seconds
Function going(10000,) {} Took 0.0171 seconds
Function going(8000,) {} Took 0.0109 seconds
Function going(6000,) {} Took 0.0061 seconds
Function going(4000,) {} Took 0.0030 seconds
Function going(2000,) {} Took 0.0009 seconds

The previous implementation is pretty good, and after running on a high input, future runs will be fast, since they will use the cache, though the first runs won’t be super fast.
A better optimization would be to use a custom factorial function, which saves the cache as it goes, and given an input, always starts from the highest previous cache instead of starting the calculation all over again, like in this code-

from functools import wraps
import time
import math


def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper


fact_cache = {1: 1}
maximal_factorial = 1

def my_fact(n):
    global fact_cache
    global maximal_factorial

    if n in fact_cache:
        return fact_cache[n]

    result_index = maximal_factorial
    result = fact_cache[maximal_factorial]

    while result_index < n:
        result_index += 1
        result *= result_index
        fact_cache[result_index] = result

    maximal_factorial = n
    return result

@timeit
def going(n):
    return math.floor(((sum(my_fact(i) for i in range(1, n+1)))/ my_fact(n) * 1000000))/ 1000000

def main():
    for i in range(1000, 20000, 2000):
        going(i)
    for i in range(20000, 1000, -2000):
        going(i)

if __name__ == '__main__':
    main()

Execution-

Function going(1000,) {} Took 0.0014 seconds
Function going(3000,) {} Took 0.0103 seconds
Function going(5000,) {} Took 0.0168 seconds
Function going(7000,) {} Took 0.0263 seconds
Function going(9000,) {} Took 0.0357 seconds
Function going(11000,) {} Took 0.0479 seconds
Function going(13000,) {} Took 0.0656 seconds
Function going(15000,) {} Took 0.0999 seconds
Function going(17000,) {} Took 0.1365 seconds
Function going(19000,) {} Took 0.1823 seconds
Function going(20000,) {} Took 0.1955 seconds
Function going(18000,) {} Took 0.1035 seconds
Function going(16000,) {} Took 0.0661 seconds
Function going(14000,) {} Took 0.0418 seconds
Function going(12000,) {} Took 0.0264 seconds
Function going(10000,) {} Took 0.0172 seconds
Function going(8000,) {} Took 0.0119 seconds
Function going(6000,) {} Took 0.0067 seconds
Function going(4000,) {} Took 0.0030 seconds
Function going(2000,) {} Took 0.0013 seconds

It may be in overkill in your case, but it’s fun to optimize!

Hope this helps 🙂

Answered By: Guy

Answer 2

The computation can be optimized to not use the factorial function and large numbers with following improvements:

O(n) processing complexity & O(1) space
large savings in space and processing time
an improvement over cache approach (see Guy answer)

Code

def faster_going(n, term = 1):
    # Using Walrus operator available in Python 3.8+
    return 1 + sum(term for i in range(n, 1, -1) if (term:=term/i))

Explanation

The summation we are computing is:

(1! + 2! + ... + (n-1)! + n!)/n!

Taking the terms from right to left we have the following terms:

term[n] = Last term will is            = n!/n!       = 1 
term[n-1] = next to last term is       = (n-1)!/n! = term[0]/n 
term[n-2] = second to last term is     = (n-2)!/n!  = term[1]/(n-1) 
... 
term[2] = 1st term                     = 1/n!       = term[3]/1

We just add term[2] + … + term[n] together to get the result without having to compute big integers.

Thus, we have a recursive relation with:

term[n] = 1
…
term[k-1] = term[k]/k

Function faster_going just adds up the terms from 1 to n using the recursive formula.

Performance Comparison

Comparison of Guy’s caching approach (see other answers)

Summary:

This approach is 41X faster than Guy approach for n = 20, 000
Function going(20000,) {} Took 0.4924 seconds
Function faster_going(20000,) {} Took 0.0127 seconds

Test Code

from functools import wraps
import time
import math


def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper


def my_fact(n):
    global fact_cache
    global maximal_factorial

    if n in fact_cache:
        return fact_cache[n]

    result_index = maximal_factorial
    result = fact_cache[maximal_factorial]

    while result_index < n:
        result_index += 1
        result *= result_index
        fact_cache[result_index] = result

    maximal_factorial = n
    return result

@timeit
def going(n):
    return math.floor(((sum(my_fact(i) for i in range(1, n+1)))/ my_fact(n) * 1000000))/ 1000000

@timeit
def faster_going(n, term = 1):
    return 1 + sum(term for i in range(n, 1, -1) if (term:=term/i))
    
def main():
    for i in range(1000, 20000, 2000):
        # Reset cache for each run, so timing is independent
        fact_cache = {1: 1}
        maximal_factorial = 1

        going(i)
        faster_going(i)
        
    for i in range(20000, 1000, -2000):
        # Reset cache for each run, so timing is independent
        fact_cache = {1: 1}
        maximal_factorial = 1
        
        going(i)
        faster_going(i)

Output

Function going(1000,) {} Took 0.0025 seconds
Function faster_going(1000,) {} Took 0.0005 seconds
Function going(3000,) {} Took 0.0159 seconds
Function faster_going(3000,) {} Took 0.0022 seconds
Function going(5000,) {} Took 0.0244 seconds
Function faster_going(5000,) {} Took 0.0030 seconds
Function going(7000,) {} Took 0.0430 seconds
Function faster_going(7000,) {} Took 0.0043 seconds
Function going(9000,) {} Took 0.0558 seconds
Function faster_going(9000,) {} Took 0.0056 seconds
Function going(11000,) {} Took 0.0914 seconds
Function faster_going(11000,) {} Took 0.0070 seconds
Function going(13000,) {} Took 0.1426 seconds
Function faster_going(13000,) {} Took 0.0084 seconds
Function going(15000,) {} Took 0.1989 seconds
Function faster_going(15000,) {} Took 0.0144 seconds
Function going(17000,) {} Took 0.2982 seconds
Function faster_going(17000,) {} Took 0.0116 seconds
Function going(19000,) {} Took 0.4696 seconds
Function faster_going(19000,) {} Took 0.0120 seconds
Function going(20000,) {} Took 0.4924 seconds
Function faster_going(20000,) {} Took 0.0127 seconds
Function going(18000,) {} Took 0.3013 seconds
Function faster_going(18000,) {} Took 0.0118 seconds
Function going(16000,) {} Took 0.1817 seconds
Function faster_going(16000,) {} Took 0.0099 seconds
Function going(14000,) {} Took 0.1093 seconds
Function faster_going(14000,) {} Took 0.0088 seconds
Function going(12000,) {} Took 0.0724 seconds
Function faster_going(12000,) {} Took 0.0072 seconds
Function going(10000,) {} Took 0.0482 seconds
Function faster_going(10000,) {} Took 0.0060 seconds
Function going(8000,) {} Took 0.0332 seconds
Function faster_going(8000,) {} Took 0.0048 seconds
Function going(6000,) {} Took 0.0208 seconds
Function faster_going(6000,) {} Took 0.0043 seconds
Function going(4000,) {} Took 0.0118 seconds
Function faster_going(4000,) {} Took 0.0024 seconds
Function going(2000,) {} Took 0.0044 seconds
Function faster_going(2000,) {} Took 0.0012 seconds

Answered By: DarrylG

Answer 3

First note that you can rewrite your sum:

(1! + 2! +....+ (n-1)! + n!) / n! = 1!/n! + 2!/n! + ... (n-1)!/n! +n!/n!

= 1/(2*3*...n) + 1/(3*...n) + ... 1/n + 1

We can simply calculate the product on the denominator and add the inverse to the total on the fly, there’s no need to store anything. So, the calculation can be made in O(n) time and O(1) space:

def better(n):
    res = 1
    prod = 1
    for k in range(n, 1, -1):
        prod *= k
        res += 1/prod
    return res

and that’s all.

There is still some room for improvement: as n grows large, the integer product will grow very large and will take always more time to calculate, as Python integers have unlimited precision. We can simply use a float instead:

def better_with_float_prod(n):
    res = 1
    prod = 1.0  # prod will be a float now
    for k in range(n, 1, -1):
        prod *= k
        res += 1/prod
    return res

Some timings (with very small n, there is little difference between the integer and float version)

n = 10

# %timeit going(n)
# %timeit better(n)
# 3.93 µs ± 42.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 1.76 µs ± 62.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

n = 1000

# 21.9 ms ± 262 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 524 µs ± 6.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



n=2000

# %timeit going(n)
# %timeit better(n)
# %timeit better_with_float_prod(n)

# 158 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 1.7 ms ± 4.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 310 µs ± 5.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


n = 5000
# %timeit going(n)
# %timeit better(n)
# %timeit better_with_float_prod(n)

# 2.12 s ± 6.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 9.77 ms ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 765 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

And with n = 100_000:

%timeit better_with_float_prod(100000)
# 15.5 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, shorter and much faster!

Answered By: Thierry Lathuille

Is there anyway to optimise this program further?

Question:

Answers: