Conclusions and
Recommendations (2004)
- There are more factorial algorithms than you learned at school  
-- or at university - because your professors do not know them either, at least 
before they have visited this site. ;-))  
- Do not use the 'naive factorial'! This ubiquitous dull algorithm. 
Though immensely popular in Computer Science to demonstrate the notion of a 
recursive function, it is by no means a sane way to compute n!. 
- If you work with large factorials, use the 
Stirling approximation. 
This formula is perhaps all you need in real life. But here we are concerned 
with the exact computation of n!. 
- What factorial algorithm should we choose for daily use with 
moderate values of n? Or for inclusion in a software library with potentially 
large values of n? As a starting point, let us look at the performance ranking 
for n = 2048007 of the top 10 algorithms relative to the 'PrimeSchoenhage' algorithm.
Performance-Profile for n=2048007!
| Algorithm | Rank | 
| PrimeSwingLuschny | 0,9 | 
| PrimeVardi | 0,9 | 
| PrimeSchoenhage | 1,0 | 
| PrimeSwingList | 1,0 | 
| PrimeSwingCache | 1,7 | 
| SplitRecursive | 1,8 | 
|  |  | 
| PrimeLeenstra | 2,1 | 
| PrimeBorwein | 2,4 | 
| Hyper | 2,7 | 
| SquareDifferenceProd | 2,7 | 
|  |  | 
The smaller the rank the better,
values p<=1 indicate excellent,
values p<=2 indicate good performance. 
- Only algorithms with a ranking < 2 qualify for further consideration 
as candidates for a software implementation.  Nevertheless the other ones 
are certainly worth an algorithmic study and analysis for their own sake!
- All algorithms can be divided into two categories: those 
which are based on the prime factorization of n! and those which are not. Those 
which are - we will call them the 'prime algorithms' - are much more complex 
than the others, which we will call the 'simple algorithms'. The prime algorithms 
need the support of a prime sieve (a good implementation of the sieve of Eratosthenes 
will do). Further they are more demanding on resources, since they need quite 
a bit of memory for allocating the arrays of prime factors. 
- Let's look first at the performance of the 'prime algorithms'. 
These are the fastest known algorithms to compute n!. Thus they should be chosen 
for inclusion in a software library where potentially large values of n have 
to be computed. Three algorithms score approximately equal: 'PrimeSwingLuschny', 
'PrimeVardi' and 'PrimeSchoenhage'. Although my personal preference is 
for 'PrimeSwingLuschny' (because of its simplicity and theoretical background), 
I propose to test all three of them in your computing environment and then choose the 
most appropriate one. 
- Let us consider now the category of 'simple' algorithms, 
those which do not make use of prime factorization. Here the situation is quite 
clear: only the algorithm 'SplitRecursive' achieves good results. So this algorithm 
is clearly the choice to be taken. (Added 
in 2010: This is no longer true as 
stated. In any case compare with the new implementation of the simple Swing 
algorithm which is superior on many systems.) If only moderate values of n (say n < 1000000) 
are to be computed and a small time penalty (a factor < 2) is tolerable this 
algorithm might also be considered as an alternative to the more complex prime 
algorithms in the indicated range. 
- Let's add a final word of caution. All the above considerations 
are valid only if the computations are based on an arithmetical library which 
provides an asymptotical fast multiplication routine like the Karatsuba multiplication 
or even better a fft-based multiplication like the Schoenhage multiplication. 
Such arithmetical libraries are freely available on the internet, look out for 
them. If, however, you use a dull library which has implemented only the 'school 
algorithm' for multiplication then the efficiency of all algorithms breaks down 
dramatically and also their relative ranking might turn out completely different! 
Regrettably the Java class 'BigInteger' included in the Java standard distribution 
is one of those dull libraries.  >:-(  If you compute with Java you 
had better take the Apfloat library of Mikko Tommila. 
- In my computing environment the algorithm 'PrimeSwing(Luschny)' 
was the fastest in the category 'prime algorithms' and the 'Split(Recursive)' 
algorithm had no rival in the category 'simple algorithms'. Thus the jury declares:
Tempora mutantur and conclusions and recommendations also... (Postscript 2008)
- With the arrival of parallel computing on the desktop some 
of the above comments are to be revisited. The algorithms discussed were adapted 
to take advantage of the multiple core processor units which became standard 
in the years 2007/2008. One of the nice observations is that both the PrimeSwing 
algorithm as well as the SplitRecursive algorithm can be easily and efficiently 
revised for parallel computing providing improved performance without introducing 
many of the complexities of today’s concurrent programming models. 
- Let us look at the performance of the top 9 algorithms as 
it stands now. The next table displays the computing time of n! for n = 10,000,000 
in seconds on a 2-core machine and the ranking relative to the 'PrimeSwing' 
algorithm. 
| N! where N = 10,000,000 |  | sec |  | ranking | 
| ParallelPrimeSwing |  | 57,9 |  | 0,91 | 
| ParallelPrimeSplit |  | 59,5 |  | 0,94 | 
| PrimeSwing |  | 63,5 |  | 1,00 | 
| PrimeSchoenhage |  | 66,5 |  | 1,05 | 
| PrimeVardi |  | 65,9 |  | 1,04 | 
| PrimeSwingList |  | 66,7 |  | 1,05 | 
| PrimeSplit |  | 64,1 |  | 1,01 | 
| ParallelSplit |  | 104,1 |  | 1,64 | 
| Split |  | 139,2 |  | 2,19 | 
- We see three new algorithms appearing in the list: 'ParallelPrimeSwing', 
'ParallelPrimeSplit' and 'ParallelSplit'.  And for the 
first time the author saw the factorial of 10 millions computed in less than 
one minute -- a personal computing record ;-) Thus the recommendation of today 
just adds the parallel attribute to the winners of 2004: 
Post-Postscript, 2010
There now are better implementations of the simple Swing algorithm which outperform
the Split algorithm on most systems. Look here.
[top of page (2004 recommendations)]