An implementation of the factorial with the prime swing algorithm in Sage. For a pure Python implementation you will need additionally a function prime_range(a, b) which returns the prime numbers in the range [a, b).

def factorialPS(n):  

    small_swing = [
        1,1,1,3,3,15,5,35,35, 315, 63, 693, 231, 3003, 429, 6435, 6435, 
        109395,12155,230945,46189,969969,88179,2028117,676039,16900975, 
        1300075,35102025,5014575,145422675,9694845,300540195,300540195 ]

    def swing(n):
        if n < 33: return small_swing[n]
        sqrtn = n.sqrt()
        factors = prime_range(n // 2 + 1, n + 1)

        for prime in prime_range(3, sqrtn + 1):
            p, q = 1, n

            while True:
                q //= prime
                if q == 0: break
                if q & 1 == 1:
                    p *= prime

            if p > 1: factors.append(p)

        R = prime_range(sqrtn + 1, n // 3 + 1)
        factors += filter(lambda x: (n // x) & 1 == 1, R)
        
        return mul(factors)  

    def odd_factorial(n):
        if n < 2: return 1
        return (odd_factorial(n//2)**2)*swing(n)

    if n == 0: return 1
    if n < 0 : return 0
    if n < 20: return mul(range(2, n + 1))

    N, bits = n, n            
    while N != 0:
        bits -= N & 1
        N >>= 1

    return odd_factorial(n)*2**bits
 

The current (0.7.3) SymPy implementation of the factorial function uses the prime swing function and is similar to the above function. However comparing the SymPy implementation (dismantled of the framework) with the above implementation in a Sage worksheet shows:

timeit("sympy_factorial(1000000)", number=10)
timeit("factorialPS(1000000)", number=10)
10 loops, best of 3: 2.02 s per loop
10 loops, best of 3: 1.31 s per loop

So what is the main difference? The computation of the product

L_product = R_product = 1
for prime in sieve.primerange(n//2 + 1, n + 1):
    L_product *= prime
for prime in primes:
    R_product *= prime
return L_product*R_product

in the SymPy version is replaced here by a single mul(factors). The reason why this is more efficient is that an iterated product (a product of the form for p in P: r *= p) is much less efficient than a product of the form mul(list), provided mul is implemented carefully, for example as a recursive product balancing the factors.

As already said, the implementation and the benchmark shown here were executed in the Sage environment. The same implementation executed with the SymPy functions does not show the speed-up observed with Sage. It remains unclear what the reason is. A standalone implementation of the algorithm in pure Python can be found here. Executed in the Sage environment it gives the following timings:

timeit("factorialPy(1000000)", number=10)
10 loops, best of 3: 1.75 s per loop

The idea of the swing factorial can also be implemented without prime-factorization (by computing the swing numbers by recursion). This leads to a highly sophisticated (albeit short) algorithm which is shown below. Benchmarks show that this is also the fastest algorithm known to compute the factorials without prime-factorization (leaving parallelization apart).

But the most surprising feature of this algorithm is that it makes heavy use of division — something which does not come to ones mind contemplating the factorial in a naive way.

def factorialS(n):
    
    smallOddSwing = [
        1,1,1,3,3,15,5,35,35, 315, 63, 693, 231, 3003, 429, 6435, 6435,
        109395,12155,230945,46189,969969,88179,2028117,676039,16900975,
        1300075,35102025,5014575,145422675,9694845,300540195,300540195 ]
    smallOddFactorial = [
        1,1,1,3,3,15,45,315, 315, 2835, 14175, 155925, 467775, 6081075,
        42567525, 638512875, 638512875 ]
    
    def product(m, len):
        if len == 1: return m
        if len == 2: return m * (m - 2)
        hlen = len >> 1
        return product(m - hlen * 2, len - hlen) * product(m, hlen)
  
    def oddFactorial(n):
        sqrOddFact = Integer(1)
        if n < 17:
            oddFact = smallOddFactorial[n]
            sqrOddFact = smallOddFactorial[n//2]
        else:
            (sqrOddFact, oldOddFact) = oddFactorial(n//2)
            if n < 33: 
                oddSwing = smallOddSwing[n] 
            else:
                len = (n - 1) // 4
                if (n % 4) != 2: len += 1
                high = n - ((n + 1) & 1)
                oddSwing = product(high, len) // oldOddFact
            oddFact = (sqrOddFact**2) * oddSwing
        return (oddFact, sqrOddFact)

    if n == 0: return 1
    if n <  0: return 0
    
    N, bits = n, n
    while N != 0:
        bits -= N & 1
        N >>= 1

    F = oddFactorial(n)
    return F[0]*2**bits

In the same computing environment as above (Sage worksheet) we see this time:

timeit("factorialS(1000000)", number=10)
10 loops, best of 3: 2.31 s per loop

Compare this to SymPy's prime-based implementation in 2.02 s.


back Back to the Homepage of Factorial Algorithms.