CLN/ginac: factoring announce, questions

Wed Jun 16 16:28:23 CEST 2004

Hello,
the following is concerned with the numtheory part of CLN. I have
appended a patch to numtheory/cl_IF.h file and three new files for
the same directory that would provide a factoring engine. The code
is a much improved version of what I sent to Richy Kreckel a few days
ago, and it explains itself. I have serious design questions on
the interface, though.

Shortly summed, you can factor any integer now within a minute whose
second largest prime factor is less than ~10^11. I am not an expert,
and I will not even think about implementing one of the more exotic
algorithms (squfof, elliptic curves, all kind of sieves). I did this
because I need ginac with a divisors() function.

The people familiar with ginac/CLN design are asked to please answer
QUESTION 1.
The function cl_factor_rec() is the recursive main engine and will be
called by factor() that is the interface to it. What should factor()
return to the outside? We have
- two lists (prime factors, exponents)
- a list of pairs (p1,e1) (p2,e2) ... pari does return data like this
- a simple list (p1, e1, p2, e2, p3, ...)

QUESTION 2.
How should the pairs/lists be implemented? We have
- vectors provided by CLN (GV/SV where's the difference?)
- STL containers
An alternative would be to implement factor() outside CLN, ie, within
ginac using lst, but I think there is interest having it in CLN. I just
ask because I do not think the STL should be duplicated by GV/SV which,
supposedly, was started when gcc did not have a working STL. I am no
expert there, either, but I think CLN's memory scheme can be adapted to
STL by simply(?) using a tailored allocator object.

QUESTION 3. 
Should divisors() go into CLN or ginac?

Many thanks for your time reading through to here. Please help---I am
just starting C++ again after 8 years of hiatus, and design thinking
is a bit unfamiliar at this time. Anyway, here is the beef:

Sincerely,
ralf

--- cl_IF.h.orig	Wed Jun 16 16:16:08 2004
+++ cl_IF.h	Tue Jun 15 19:50:14 2004
@@ -3,6 +3,7 @@
 #ifndef _CL_IF_H
 #define _CL_IF_H
 
+#include <vector>
 #include "cln/number.h"
 #include "cln/integer.h"
 
@@ -49,6 +50,15 @@
 // Returns false if n is definitely composite, and then sets factor = some
 // nontrivial factor or 0.
 extern cl_boolean cl_miller_rabin_test (const cl_I& n, int count, cl_I* factor);
+
+// Pollard rho algorithm
+extern cl_I cl_pollard_rho (const cl_I& n, long iters);
+
+// Pollard p-1 algorithm
+extern cl_I cl_pollard_pmone (const cl_I& n);
+
+// Core recursive factoring routine, needs container
+extern int cl_factor_rec (const cl_I& n, std::vector<cl_I>& v);
 
 }  // namespace cln
 
--- cl_IF_factor_rec.cc.orig	Wed Jun 16 16:18:44 2004
+++ cl_IF_factor_rec.cc	Tue Jun 15 19:10:12 2004
@@ -0,0 +1,109 @@
+// cl_factor_rec().
+
+// General includes.
+#include "cl_sysdep.h"
+
+// Specification.
+#include "cl_IF.h"
+
+
+// Implementation.
+
+#include "cln/integer.h"
+
+namespace cln {
+
+int cl_factor_rec(const cl_I& n, std::vector<cl_I>& v)
+// recursive buildup of prime factors vector, part of code stolen from 
+// isprobprime.cc.
+// prime factors of n are collected in v, 0 returned if factorization complete.
+{
+	// With a Miller-Rabin count = 50 the final error probability is
+	// 4^-50 < 10^-30.
+	int count = 50;
+	// Trial division (rules out 87% of all numbers quickly).
+	const uintL l = integer_length(n);
+	if (l <= 32) 
+  {
+		uint32 nn = cl_I_to_UL(n);
+		if (nn <= cl_small_prime_table_limit) 
+    {
+			// Table lookup.
+			uintL i = cl_small_prime_table_search(nn);
+			if (i < cl_small_prime_table_size
+			    && ((unsigned int) cl_small_prime_table[i] == nn
+			        || nn == 2))
+		  { v.push_back (n); return 0; }   // end leaf of recursion
+		}
+		if ((nn % 2) == 0)
+      { v.push_back (2); return cl_factor_rec (n>>1, v); }
+    if ((nn = cl_trialdivision(nn,1,cl_small_prime_table_limit)) >0)
+			{ v.push_back (nn); return cl_factor_rec (exquo (n,nn), v); }
+
+    // For small n, only few Miller-Rabin tests are needed.
+		if (nn < 2000U) count = 1; // {2}
+		else if (nn < 1300000U) count = 2; // {2,3}
+		else if (nn < 25000000U) count = 3; // {2,3,5}
+		else if (nn < 3200000000U) count = 4; // {2,3,5,7}
+	} 
+  else if (l <= 64) 
+  {
+		uint32 nhi = cl_I_to_UL(ldb(n,cl_byte(32,32)));
+		uint32 nlo = cl_I_to_UL(ldb(n,cl_byte(32,0)));
+		if ((nlo % 2) == 0)
+      { v.push_back (2); return cl_factor_rec (n>>1, v); }
+    uint32 t;
+    if ((t = cl_trialdivision (nhi,nlo,1,cl_small_prime_table_limit)) >0)
+			{ v.push_back (t); return cl_factor_rec (exquo (n,t), v); }
+	} 
+  else 
+  {
+		if (evenp (n))
+      { v.push_back (2); return cl_factor_rec (n>>1, v); }
+    uint32 t;
+    if ((t = cl_trialdivision (n,1,cl_small_prime_table_limit)) >0)
+			{ v.push_back (t); return cl_factor_rec (exquo (n,t), v); }
+	}
+  
+  // Handle perfect powers.
+  int m;
+  cl_I r;
+  if (rootp (n, m, &r))
+  {
+    std::vector<cl_I> vv;
+    int ret = cl_factor_rec (r, vv);
+    for (int l=0; l<m; ++l)
+      copy (vv.begin(), vv.end(), v.begin());
+    return ret;
+  }
+
+  // After factoring out small factors: is remaining number prime?
+	if (cl_miller_rabin_test (n, count, NULL))
+  {
+    v.push_back (n);
+    return 0;
+  }   // end leaf of recursion
+
+  //  30k iterations of Pollard rho finds most 8-digit factors.
+  // 150k iterations of Pollard rho finds most 10-digit factors.
+  cl_I f;
+  f = cl_pollard_rho (n, 150000L);
+  if (f != 1 && f != n)
+    { v.push_back (f); return cl_factor_rec (exquo (n,f), v); }
+
+  // Pollard p-1 using cl_small_prime_table. Does not often find a factor
+  // but does not cost much, either. If it finds one, it may have
+  // up to 18-20 digits, occasionally.
+  f = cl_pollard_pmone (n);
+  if (f != 1 && f != n)
+    { v.push_back (f); return cl_factor_rec (exquo (n,f), v); }
+
+  // Last step (for now): let Pollard rho run 'forever'.
+  // If you don't want this, return 1 without calling rho() again.
+  v.push_back (cl_pollard_rho (n, 0));
+  return 0; // end leaf of recursion
+}
+
+
+}  // namespace cln
+


--- cl_IF_pollard_rho.cc.orig	Wed Jun 16 16:21:23 2004
+++ cl_IF_pollard_rho.cc	Tue Jun 15 17:04:20 2004
@@ -0,0 +1,42 @@
+// cl_pollard_rho().
+
+// General includes.
+#include "cl_sysdep.h"
+
+// Specification.
+#include "cl_IF.h"
+
+
+// Implementation.
+
+#include "cln/integer.h"
+
+namespace cln {
+
+cl_I cl_pollard_rho (const cl_I& n, long iters)
+// If N is composite, returns a factor of N, for iters>0 not necessarily <>N.
+// Adaptation of an implementation by Richard Crandall (giantint).
+// This gets slow when factors become larger than 10^10. iters=0 runs until 
+// factor is found. Using modinteger slows function down by 10 percent.
+{
+  cl_I g=1, t1=3, t2=3;
+
+  for (long l=0; !iters || l<iters; ++l)
+  {
+    if (l%100 == 0) // removing this line seems to make no big difference
+    {
+      if ((g = gcd (g, n)) != 1)
+          break;
+    }
+    t1 = mod (square (t1) + 2, n);
+    t2 = mod (square (t2) + 2, n);
+    t2 = mod (square (t2) + 2, n);
+    g  = mod (g * abs (t2-t1), n);
+  }
+  g = gcd (g, n);
+  return g;
+}
+
+}  // namespace cln
+
+

--- cl_IF_pollard_pmone.cc.orig	Wed Jun 16 16:22:45 2004
+++ cl_IF_pollard_pmone.cc	Tue Jun 15 17:48:57 2004
@@ -0,0 +1,48 @@
+// cl_pollard_pmone().
+
+// General includes.
+#include "cl_sysdep.h"
+
+// Specification.
+#include "cl_IF.h"
+
+
+// Implementation.
+
+#include "cln/modinteger.h"
+
+namespace cln {
+
+cl_I cl_pollard_pmone (const cl_I& n)
+// Pollard p-1 using cl_small_prime_table. Does not often find a factor
+// but does not cost much, either. If it finds one, it may have
+// up to 18-20 digits, occasionally.
+// Returns a factor of N, not necessarily <>n.
+// Adaptation of an implementation by Richard Crandall (giantint).
+{
+  cl_modint_ring R = find_modint_ring (n);
+  cl_MI g(R,cl_I(1)), t1(R,cl_I(3));
+  cl_I t;
+
+  for(long l=0; l<cl_small_prime_table_size; ++l)
+  {
+    long cnt = long((8.0*log(2.0))/log(1.0*cl_small_prime_table[l]));
+    if (cnt < 2) 
+      cnt = 1;
+    for (long k=0; k<cnt; ++k)
+      t1 = expt_pos (t1, cl_small_prime_table[l]);
+
+    g = g * (t1-1);
+    if (l%100==0)
+    {
+      if ((t = gcd (R->retract(g), n)) != 1)
+        break;
+      g = R->canonhom (t);
+    }
+  }
+  t = gcd (R->retract(g), n);
+  return t;
+}
+
+}  // namespace cln
+