1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3 %%% Version 12, 1999-09-07, Bruno
4 %%% -----------------------------
6 %%% * Habe neue Referenzen zu den Karatsuba-Arbeiten hinzugefügt, entnommen
7 %%% dem großartigen Paper CECM-98-118 (98_118-Borwein-Bradley-Crandall.dvi).
8 %%% * Habe in acm.bst, Funktion format.title, das change.case rausgenommen,
9 %%% das die Namen von Euler, Riemann u.a. in Kleinbuchstaben konvertiert
12 %%% Version 11, 1998-12-20, Bruno
13 %%% -----------------------------
15 %%% * Habe Referenzen zu den Karatsuba-Arbeiten hinzugefügt.
17 %%% Version 10, 1998-03-10, Thomas & Bruno
18 %%% --------------------------------------
20 %%% * Korrigiere die Formel für a(n) bei zeta(3).
21 %%% * Schönheitsfehler im Literaturverzeichnis.
23 %%% Version 9, 1998-03-06, Thomas & Bruno
24 %%% -------------------------------------
26 %%% * Schreibe \frac{1}{|x|} statt \frac{1}{x}.
28 %%% Version 8, 1998-01-16b, Thomas
29 %%% ------------------------------
31 %%% * Drei Literaturverweise für LiDIA statt nur einem.
33 %%% Version 7, 1998-01-16a, Bruno
34 %%% -----------------------------
36 %%% * Adresse: Praefix F fuer Frankreich
37 %%% * Abstract: Erwaehne zeta(3)
38 %%% * Kleinere Korrekturen der O()-Abschaetzungen
40 %%% Version 6, 1998-01-14, Thomas
41 %%% -----------------------------
43 %%% * habe meine Adresse ge"andert.
45 %%% * habe Resultat f"ur die Euler Konstante + Kettenbruch eingef"uhrt
47 %%% Version 5, 1997-12-11, Thomas
48 %%% -----------------------------
50 %%% * Habe die Anzahl der Scritte bei der Euler Konstante von
51 %%% x = ceiling(N log(2)/4)^2 auf x = ceiling((N+2) log(2)/4)^2
52 %%% hochgesetzt (Mail von Sofroniou)
54 %%% * Habe Kommentar eingef"ugt bzgl Checkpointing bei der Euler
55 %%% Konstante und Gamma(x) (Mail von Sofroniou)
57 %%% * Habe Section zu Geschwindigkeit von Maple gegen"uber Pari
58 %%% verbessert (mail von Laurent Bernardi).
60 %%% Version 4, 1997-01-09, Thomas
61 %%% -----------------------------
63 %%% * Habe die Komplexitätsaussage für sinh, cosh ergänzt.
64 %%% * Habe die Versionen der getesteten CAS ergänzt.
66 %%% Version 3, 1997-01-07, Bruno
67 %%% ----------------------------
69 %%% * Meine Firma schreibt sich mit vier Grossbuchstaben.
70 %%% * Apery schreibt sich m.W. mit einem Akzent.
71 %%% * Die Fehlermeldung meldet n2-n1>0, nicht n1-n2>0.
72 %%% * N -> \(N\) (zweimal)
73 %%% * Leerzeile entfernt nach Display-Formeln, bei denen der Absatz
74 %%% weitergeht. Hat den Effekt eines \noindent.
75 %%% * Im Abschnitt: arctan(x) for rational x: "another way" -> "the fastest way"
76 %%% * "[87]" -> "\cite{87}"
77 %%% * Das Cohen-Villegas-Zagier brauchen wir nun doch nicht zu zitieren.
78 %%% * Die "Note:" am Ende von Abschnitt ueber die Gamma-Funktion optisch
80 %%% * Die Formel fuer die hypergeometrische Funktionen optisch verschoenert.
81 %%% Andere Formeln ebenso.
82 %%% * Figure 2, erste Spalte rechtsbuendig.
83 %%% * "out performs" -> "outperforms"
84 %%% * "the streng" -> "the strength"
85 %%% * Hinweis auf die Parallelisierbarkeit im Abstract.
86 %%% * Bibtex-Style gehackt, damit nicht jeder zweite Autor auf seine
87 %%% Anfangsbuchstaben verkuerzt und alleinstehende Autoren ihres
88 %%% Vornamens beraubt werden.
90 %%% Version 2, 1997-01-06, Thomas
91 %%% -----------------------------
93 %%% * geänderte Abschnitte sind auskommentiert mit %%%. Alle
94 %%% Änderungen sind als Vorschlag zu verstehen. Der Grund
95 %%% wird im folgenden angegeben. Falls Du mit einer Änderung
96 %%% einverstanden bist, kannst Du alle %%%-Zeilen löschen.
98 %%% * Lyx defines wurden entfernt. Grund: das ISSAC-acmconf.sty
99 %%% erlaubt keine fremde macros. Übersetzung mit LaTeX geht.
100 %%% * habe Keyboardumlaute (ä,ü,ö,ß) in LaTeX umgeschrieben.
101 %%% Grund: damit die Submission in einem file geht.
102 %%% * Habe fontenc und psfig usepackage Befehle entfernt.
103 %%% Grund: fonts bestimmt acmconf.sty und keine Bilder vorhanden.
104 %%% * Habe bibliography mit BibTeX (binsplit.bib) und acm.bst
105 %%% erstellt. Grund: wird von ISSAC '97 verlangt.
106 %%% * Habe langen Formeln in einer eqnarray Umgebung gesteckt.
107 %%% Grund: acmconf.sty läuft im twocolumn-Modus. Lange Formeln
108 %%% haben die Ausgabe durcheinander gebracht.
109 %%% * Habe Reihenfolge bei der Beschreibung der elementare
110 %%% Funktionen geändert, sodaß zuerst die rationale und dann
111 %%% die reelle version beschrieben wird. Grund: Einheitlichkeit.
112 %%% * Habe sinh mit binary-splitting gegen sinh mit exp-Berechnung
113 %%% getestet. Sie sind ungefähr gleich gut auf meinen Pentium,
114 %%% mir machen "Wackler" beim cosh. cosh ist ab und zu, sowohl
115 %%% bei kleiner als auch bei große Präzision langsamer. Habe
116 %%% dies und dem Abschnitt sinh, cosh ausgefüllt. Grund: es hat
118 %%% * Habe artanh Abschnitt entfernt. Grund: ich habe in der
119 %%% Einleitung der elementaren Funktionen darauf verwiesen, daß
120 %%% man die Berechnung anderer Funktionen (wie artanh) auf die
121 %%% hier erwähnte zurückführen oder auf analoger Weise
122 %%% implementieren kann. Ich denke man braucht nicht alles explizit
125 %%% * Habe Dein Dankeschön an mich entfernt.
126 %%% * Habe Abschnitt über Konvergenzbeschleunigung entfernt.
127 %%% Grund: das geht in Dein MathComp paper.
129 %%% * Habe neue Formel für pi eingefügt. Grund: einfacher,
130 %%% effizienter und stimmt mit der angegebenen Referenz
132 %%% * Habe die Berechnung der Apery Konstante angepasst.
133 %%% Grund: die hier angegebenen Formel wurde mit einer
134 %%% umgeformten Reihe berechnet. Wenn man dieses nicht
135 %%% kennt, wirkt es verwirrend. Keine Effizienz-steigerung.
136 %%% * Habe die Beschreibung für die erste version der Euler
137 %%% Konstante entfernt. Grund: wird von der zweiten version
138 %%% in jeder Hinsicht (Beweis, Effizienz) gedeckt.
139 %%% * Habe Abschnitte über Checkpointing und Parallelisierung
140 %%% eingefügt. Ein Beispiel über die Wirksamkeit habe ich
141 %%% bei der Apery Konstante angegeben. Grund: damit können wir
142 %%% das Paper auch bei PASCO '97 einreichen.
144 %%% * Habe Beispiel-C++-Implementierung für abpq Reihen eingefügt.
145 %%% Grund: zeigen wie einfach es ist wenn man die Formeln hat ;-)
146 %%% * Habe Beispiel-Implementierung für abpqcd Reihen eingefügt.
148 %%% * Habe Computational results und Conclusions Abschnitt eingefügt.
149 %%% * Habe die Namen der Konstanten (C, G, ...) and die entsprechenden
150 %%% Abschnitten eingefügt. Grund: diese Namen werden bei den
151 %%% Tabellen im Abschnitt Computational results benutzt.
152 %%% * Habe Verweis an LiDIA eingefügt. Grund: wird bei Computational
153 %%% results erw\"ahnt.
155 %%% Version 1, 1996-11-30, Bruno
157 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
159 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
160 %%% Bruno Haible, Thomas Papanikolaou. %%%%
161 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
163 \documentstyle{acmconf}
165 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
167 %\catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
168 %\def\@Re{\qopname@{Re}} \def\re#1{{\@Re #1}}
169 %\def\@Im{\qopname@{Im}} \def\im#1{{\@Im #1}}
170 %\catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
172 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
174 \catcode`@=11 % @ ist ab jetzt ein gewoehnlicher Buchstabe
175 \def\re{\mathop{\operator@font Re}\nolimits}
176 \def\im{\mathop{\operator@font Im}\nolimits}
177 \def\artanh{\mathop{\operator@font artanh}\nolimits}
178 \catcode`@=12 % @ ist ab jetzt wieder ein Sonderzeichen
180 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
183 \title{Fast multiprecision evaluation of series of rational numbers}
187 {Bruno Haible} & \hspace*{2cm} & {Thomas Papanikolaou}\\
188 {\normalsize ILOG} && {\normalsize Laboratoire A2X}\\
189 {\normalsize 9, rue de Verdun} && {\normalsize 351, cours de la Lib\'eration}\\
190 {\normalsize F -- 94253 Gentilly Cedex} && {\normalsize F -- 33405 Talence Cedex}\\
191 {\normalsize {\tt haible@ilog.fr}} && {\normalsize {\tt papanik@math.u-bordeaux.fr}}\\
199 We describe two techniques for fast multiple-precision evaluation of linearly
200 convergent series, including power series and Ramanujan series. The computation
201 time for \(N\) bits is \( O((\log N)^{2}M(N)) \), where \( M(N) \) is the time
202 needed to multiply two \(N\)-bit numbers. Applications include fast algorithms
203 for elementary functions, \(\pi\), hypergeometric functions at rational points,
204 $\zeta(3)$, Euler's, Catalan's and Ap{\'e}ry's constant. The algorithms are
205 suitable for parallel computation.
209 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
210 \section{Introduction}
212 Multiple-precision evaluation of real numbers has become efficiently possible
213 since Sch\"onhage and Strassen \cite{71} have showed that the bit complexity of
214 the multiplication of two \(N\)-bit numbers is
215 \( M(N)=O(N\:\log N\:\log\log N) \).
216 This is not only a theoretical result; a C++ implementation \cite{96a} can
217 exploit this already for \( N=40000 \) bits. Algorithms for computing
218 elementary functions (exp, log, sin, cos, tan, asin, acos, atan, sinh, cosh,
219 tanh, arsinh, arcosh, artanh) have appeared in \cite{76b}, and a remarkable
220 algorithm for \( \pi \) was found by Brent and Salamin \cite{76c}.
222 However, all these algorithms suffer from the fact that calculated results
223 are not reusable, since the computation is done using real arithmetic (using
224 exact rational arithmetic would be extremely inefficient). Therefore functions
225 or constants have to be recomputed from the scratch every time higher precision
228 In this note, we present algorithms for fast computation of sums of the form
230 \[S=\sum _{n=0}^{\infty }R(n)F(0)\cdots F(n)\]
231 where \( R(n) \) and \( F(n) \) are rational functions in \( n \) with rational
232 coefficients, provided that this sum is linearly convergent, i.e. that the
233 \( n \)-th term is \( O(c^{-n}) \) with \( c>1 \). Examples include elementary
234 and hypergeometric functions at rational points in the {\em interior} of the
235 circle of convergence, as well as \( \pi \) and Euler's, Catalan's and
236 Ap{\'e}ry's constants.
238 The presented algorithms are {\em easy to implement} and {\em extremely
239 efficient}, since they take advantage of pure integer arithmetic. The
240 calculated results are {\em exact}, making {\em checkpointing} and
241 {\em reuse} of computations possible. Finally,
242 the computation of our algorithms {\em can be easily parallelised}.
244 After publishing the present paper, we were informed that the results of
245 section 2 were already published by E.~Karatsuba in \cite{91,91b,93,95c}.
247 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
248 \section{Evaluation of linearly convergent series}
250 The technique presented here applies to all linearly convergent sums of the
253 \[ S=\sum ^{\infty }_{n=0}
254 \frac{a(n)}{b(n)}\frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
255 where \( a(n) \), \( b(n) \), \( p(n) \), \( q(n) \) are integers with
256 \( O(\log n) \) bits. The most often used case is that \( a(n) \), \( b(n) \),
257 \( p(n) \), \( q(n) \) are polynomials in \( n \) with integer coefficients.
263 Given two index bounds \( n_{1} \) and \( n_{2} \), consider the partial sum
266 S=\sum _{n_{1}\leq n<n_{2}}
267 \frac{a(n)}{b(n)}\frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
269 It is not computed directly. Instead, we compute the integers
270 \( P={p(n_{1})}\cdots {p(n_{2}-1)} \), \( Q={q(n_{1})}\cdots {q(n_{2}-1)} \),
271 \( B={b(n_{1})}\cdots {b(n_{2}-1)} \) and \( T=BQS \). If \( n_{2}-n_{1}<5 \),
272 these are computed directly. If \( n_{2}-n_{1}\geq 5 \), they are computed
273 using {\em binary splitting}: Choose an index \( n_{m} \) in the middle of
274 \( n_{1} \) and \( n_{2} \), compute the components \( P_{l} \), \( Q_{l} \),
275 \( B_{l} \), \( T_{l} \) belonging to the interval \( n_{1}\leq n<n_{m} \),
276 compute the components \( P_{r} \), \( Q_{r} \), \( B_{r} \), \( T_{r} \)
277 belonging to the interval \( n_{m}\leq n<n_{2} \), and set
278 \( P=P_{l}P_{r} \), \( Q=Q_{l}Q_{r} \), \( B=B_{l}B_{r} \) and
279 \( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \).
281 Finally, this algorithm is applied to \( n_{1}=0 \) and
282 \( n_{2}=n_{\max }=O(N) \), and a final floating-point division
283 \( S=\frac{T}{BQ} \) is performed.
289 The bit complexity of computing \( S \) with \( N \) bits of precision is
290 \( O((\log N)^{2}M(N)) \).
296 Since we have assumed the series to be linearly convergent, the \( n \)-th
297 term is \( O(c^{-n}) \) with \( c>1 \). Hence choosing
298 \( n_{\max }=N\frac{\log 2}{\log c}+O(1) \) will ensure that the round-off
299 error is \( <2^{-N} \). By our assumption that \( a(n) \), \( b(n) \),
300 \( p(n) \), \( q(n) \) are integers with \( O(\log n) \) bits, the integers
301 \( P \), \( Q \), \( B \), \( T \) belonging to the interval
302 \( n_{1}\leq n<n_{2} \) all have \( O((n_{2}-n_{1})\log n_{2}) \) bits.
304 The algorithm's recursion depth is \( d=\frac{\log n_{\max }}{\log 2}+O(1) \).
305 At recursion depth \( k \) (\( 1\leq k\leq d \)), integers having each
306 \( O(\frac{n_{\max }}{2^{k}}\log n_{\max }) \) bits are multiplied. Thus,
307 the entire computation time \( t \) is
309 t &=& \sum ^{d}_{k=1}
310 2^{k-1}O\left( M\left( \frac{n_{\max }}{2^{k}}\log n_{\max }\right)\right)\\
311 &=& \sum ^{d}_{k=1} O\left( M\left( n_{\max }\log n_{\max }\right) \right)\\
312 &=& O(\log n_{\max }M(n_{\max }\log n_{\max }))
314 Because of \( n_{\max }=O( \frac{N}{\log c}) \) and
316 M\left(\frac{N}{\log c} \log \frac{N}{\log c}\right)
317 &=& O\left(\frac{1}{\log c} N\, (\log N)^{2}\, \log \log N\right)\\
318 &=& O\left(\frac{1}{\log c} \log N\, M(N)\right)
321 \[ t = O\left(\frac{1}{\log c} (\log N)^{2}M(N)\right) \]
322 Considering \(c\) as constant, this is the desired result.
325 \item [Checkpointing/Parallelising:]~
328 A checkpoint can be easily done by storing the (integer) values of
329 \( n_1 \), \( n_2 \), \( P \), \( Q \), \( B \) and \( T \).
330 Similarly, if \( m \) processors are available, then the interval
331 \( [0,n_{max}] \) can be divided into \( m \) pieces of length
332 \( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
333 computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
334 combined to the final result using the rules described above.
340 For the special case \( a(n)=b(n)=1 \), the binary splitting algorithm has
341 already been documented in \cite{76a}, section 6, and \cite{87}, section 10.2.3.
343 Explicit computation of \( P \), \( Q \), \( B \), \( T \) is only required
344 as a recursion base, for \( n_{2}-n_{1}<2 \), but avoiding recursions for
345 \( n_{2}-n_{1}<5 \) gains some percent of execution speed.
347 The binary splitting algorithm is asymptotically faster than step-by-step
348 evaluation of the sum -- which has binary complexity \( O(N^{2}) \) -- because
349 it pushes as much multiplication work as possible to the region where
350 multiplication becomes efficient. If the multiplication were implemented
351 as an \( M(N)=O(N^{2}) \) algorithm, the binary splitting algorithm would
352 provide no speedup over step-by-step evaluation.
355 \item [Implementation:]~
358 In the following we present a simplified C++ implementation of
359 the above algorithm\footnote{A complete implementation can be found in
360 CLN \cite{96a}. The implementation of the binary-splitting method will
361 be also available in {\sf LiDIA-1.4}}. The initialisation is done by a
362 structure {\tt abpq\_series} containing arrays {\tt a}, {\tt b}, {\tt p}
363 and {\tt q} of multiprecision integers ({\tt bigint}s). The values of
364 the arrays at the index \( n \) correspond to the values of the functions
365 \( a \), \( b \), \( p \) and \( q \) at the integer point \( n \).
366 The (partial) results of the algorithm are stored in the
367 {\tt abpq\_series\_result} structure.
370 // abpq_series is initialised by user
371 struct { bigint *a, *b, *p, *q;
374 // abpq_series_result holds the partial results
375 struct { bigint P, Q, B, T;
376 } abpq_series_result;
378 // binary splitting summation for abpq_series
379 void sum_abpq(abpq_series_result & r,
381 const abpq_series & arg)
383 // check the length of the summation interval
387 error_handler("summation device",
388 "sum_abpq:: n2-n1 should be > 0.");
391 case 1: // the result at the point n1
395 r.T = arg.a[n1] * arg.p[n1];
398 // cases 2, 3, 4 left out for simplicity
400 default: // the general case
402 // the left and the right partial sum
403 abpq_series_result L, R;
405 // find the middle of the interval
406 int nm = (n1 + n2) / 2;
409 sum_abpq(L, n1, nm, arg);
412 sum_abpq(R, nm, n2, arg);
418 r.T = R.B * R.Q * L.T + L.B * L.P * R.T;
424 Note that the multiprecision integers could be replaced here by integer
425 polynomials, or by any other ring providing the operators \( = \) (assignment),
426 \( + \) (addition) and \( * \) (multiplication). For example, one could regard
427 a bivariate polynomial over the integers as a series over the second variable,
428 with polynomials over the first variable as its coefficients. This would result
429 an accelerated algorithm for summing bivariate (and thus multivariate)
432 \subsection{Example: The factorial}
434 This is the most classical example of the binary splitting algorithm and was
435 probably known long before \cite{87}.
437 Computation of the factorial is best done using the binary splitting algorithm,
438 combined with a reduction of the even factors into odd factors and
439 multiplication with a power of 2, according to the formula
442 n!=2^{n-\sigma _{2}(n)}\cdot \prod _{k\geq 1}
443 \left( \prod _{\frac{n}{2^{k}}<2m+1\leq \frac{n}{2^{k-1}}}(2m+1)\right) ^{k}\]
444 and where the products
446 P(n_{1},n_{2})=\prod _{n_{1}<m\leq n_{2}}(2m+1)\]
447 are evaluated according to the binary splitting algorithm:
448 \( P(n_{1},n_{2})=P(n_{1},n_{m})P(n_{m},n_{2}) \) with
449 \( n_{m}=\left\lfloor \frac{n_{1}+n_{2}}{2}\right\rfloor \)
450 if \( n_{2}-n_{1}\geq 5 \).
452 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
453 \subsection{Example: Elementary functions at rational points}
455 The binary splitting algorithm can be applied to the fast computation of the
456 elementary functions at rational points \( x=\frac{u}{v} \), simply
457 by using the power series. We present how this can be done for
458 \( \exp (x) \), \( \ln (x) \), \( \sin (x) \), \( \cos (x) \),
459 \( \arctan (x) \), \( \sinh (x) \) and \( \cosh (x) \). The calculation
460 of other elementary functions is similar (or it can be reduced to the
461 calculation of these functions).
464 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
465 \subsubsection{\( \exp (x) \) for rational \( x \)}
467 This is a direct application of the above algorithm with \( a(n)=1 \),
468 \( b(n)=1 \), \( p(0)=q(0)=1 \), and \( p(n)=u \), \( q(n)=nv \) for \( n>0 \).
469 Because the series is not only linearly convergent -- \( \exp (x) \) is an
470 entire function --, \( n_{\max }=O(\frac{N}{\log N + \log \frac{1}{|x|}}) \),
471 hence the bit complexity is
472 \[ O\left(\frac{(\log N)^2}{\log N + \log \frac{1}{|x|}} M(N)\right) \]
473 Considering \(x\) as constant, this is \( O(\log N\: M(N)) \).
476 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
477 \subsubsection{\( \exp (x) \) for real \( x \)}
479 This can be computed using the addition theorem for exp, by a trick due to
480 Brent \cite{76a} (see also \cite{87}, section 10.2, exercise 8). Write
483 x=x_{0}+\sum _{k=0}^{\infty }\frac{u_{k}}{v_{k}}\]
484 with \( x_{0} \) integer, \( v_{k}=2^{2^{k}} \) and
485 \( |u_{k}|<2^{2^{k-1}} \), and compute
488 \exp (x_{0})\cdot \prod _{k\geq 0}\exp \left( \frac{u_{k}}{v_{k}}\right) \]
490 This algorithm has bit complexity
491 \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
492 = O((\log N)^{2}M(N)) \]
495 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
496 \subsubsection{ \( \ln (x) \) for rational \( x \)}
498 For rational \( |x-1|<1 \), the binary splitting algorithm can also be applied
499 directly to the power series for \( \ln (x) \). Write \( x-1=\frac{u}{v} \)
500 and compute the series with \( a(n)=1 \), \( b(n)=n+1 \), \( q(n)=v \),
501 \( p(0)=u \), and \( p(n)=-u \) for \( n>0 \).
503 This algorithm has bit complexity \( O((\log N)^{2}M(N)) \).
506 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
507 \subsubsection{\( \ln (x) \) for real \( x \)}
509 This can be computed using the ``inverse'' Brent trick:
511 Start with \( y:=0 \).
513 As long as \( x\neq 1 \) within the actual precision, choose \( k \)
514 maximal with \( |x-1|<2^{-k} \). Put \( z=2^{-2k}\left[ 2^{2k}(x-1)\right] \),
515 i.e. let \( z \) contain the first \( k \) significant bits of \( x-1 \).
516 \( z \) is a good approximation for \( \ln (x) \). Set \( y:=y+z \) and
517 \( x:=x\cdot \exp (-z) \).
519 Since \( x\cdot \exp (y) \) is an invariant of the algorithm, the final
520 \( y \) is the desired value \( \ln (x) \).
522 This algorithm has bit complexity
523 \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
524 = O((\log N)^{2}M(N)) \]
527 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
528 \subsubsection{ \( \sin (x) \), \( \cos (x) \) for rational \( x \)}
530 These are direct applications of the binary splitting algorithm: For
531 \( \sin (x) \), put \( a(n)=1 \), \( b(n)=1 \), \( p(0)=u \),
532 \( q(0)=v \), and \( p(n)=-u^{2} \), \( q(n)=(2n)(2n+1)v^{2} \) for
533 \( n>0 \). For \( \cos (x) \), put \( a(n)=1 \), \( b(n)=1 \),
534 \( p(0)=1 \), \( q(0)=1 \), and \( p(n)=-u^{2} \), \( q(n)=(2n-1)(2n)v^{2} \)
535 for \( n>0 \). Of course, when both \( \sin (x) \) and \( \cos (x) \) are
536 needed, one should only compute
537 \( \sin (x) \) this way, and then set
538 \( \cos (x)=\pm \sqrt{1-\sin (x)^{2}} \). This is a 20\% speedup at least.
540 The bit complexity of these algorithms is \( O(\log N\: M(N)) \).
543 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
544 \subsubsection{ \( \sin (x) \), \( \cos (x) \) for real \( x \)}
546 To compute \( \cos (x)+i\sin (x)=\exp (ix) \) for real \( x \), again the
547 addition theorems and Brent's trick
548 can be used. The resulting algorithm has bit complexity
549 \( O((\log N)^{2}M(N)) \).
552 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
553 \subsubsection{ \( \arctan (x) \) for rational \( x \)}
555 For rational \( |x|<1 \), the fastest way to compute \( \arctan (x) \) with
556 bit complexity \( O((\log N)^{2}M(N)) \) is
557 to apply the binary splitting algorithm directly to the power series
558 for \( \arctan (x) \). Put \( a(n)=1 \), \( b(n)=2n+1 \), \( q(n)=1 \),
559 \( p(0)=x \) and \( p(n)=-x^{2} \) for \( n>0 \).
562 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
563 \subsubsection{ \( \arctan (x) \) for real \( x \)}
565 This again can be computed using the ``inverse'' Brent trick:
567 Start out with \( z:=\frac{1}{\sqrt{1+x^{2}}}+i\frac{x}{\sqrt{1+x^{2}}} \)
568 and \( \varphi :=0 \). During the algorithm \( z \) will be a complex number
569 with \( |z|=1 \) and \( \re (z)>0 \).
571 As long as \( \im (z)\neq 0 \) within the actual precision, choose \( k \)
572 maximal with \( |\im (z)|<2^{-k} \).
573 Put \( \alpha =2^{-2k}\left[ 2^{2k}\im (z)\right] \), i.e. let \( \alpha \)
574 contain the first \( k \) significant bits of \( \im (z) \). \( \alpha \)
575 is a good approximation for \( \arcsin (\im (z)) \). Set
576 \( \varphi :=\varphi +\alpha \) and \( z:=z\cdot \exp (-i\alpha ) \).
578 Since \( z\cdot \exp (i\varphi ) \) is an invariant of the algorithm, the
579 final \( \varphi \) is the desired
580 value \( \arcsin \frac{x}{\sqrt{1+x^{2}}} \).
582 This algorithm has bit complexity
583 \[ O\left(\sum\limits_{k=0}^{O(\log N)} \frac{(\log N)^2}{\log N + 2^k} M(N)\right)
584 = O((\log N)^{2}M(N)) \]
587 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
588 \subsubsection{\( \sinh (x) \), \( \cosh (x) \) for rational and real \( x \)}
590 These can be computed by similar algorithms as \( \sin (x) \) and
591 \( \cos (x) \) above, with the same asymptotic bit complexity. The
592 standard computation, using \( \exp (x) \) and its reciprocal (calculated
593 by the Newton method) results also to the same complexity and works equally
596 The bit complexity of these algorithms is \( O(\log N\: M(N)) \) for rational
597 \( x \) and \( O((\log N)^{2}M(N)) \) for real \( x \).
599 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
600 \subsection{Example: Hypergeometric functions at rational points}
602 The binary splitting algorithm is well suited for the evaluation of a
603 hypergeometric series
606 F\left( \begin{array}{ccc}
607 a_{1}, & \ldots , & a_{r}\\
608 b_{1}, & \ldots , & b_{s}
610 \big| x\right) =\sum ^{\infty }_{n=0}
611 \frac{a_{1}^{\overline{n}}\cdots
612 a_{r}^{\overline{n}}}{b_{1}^{\overline{n}}\cdots b_{s}^{\overline{n}}}x^{n}\]
613 with rational coefficients \( a_{1} \), ..., \( a_{r} \), \( b_{1} \),
614 ..., \( b_{s} \) at a rational point \( x \) in the interior of the circle of
615 convergence. Just put \( a(n)=1 \), \( b(n)=1 \), \( p(0)=q(0)=1 \), and
616 \( \frac{p(n)}{q(n)}=\frac{(a_{1}+n-1)\cdots
617 (a_{r}+n-1)x}{(b_{1}+n-1)\cdots (b_{s}+n-1)} \) for \( n>0 \). The evaluation
618 can thus be done with
619 bit complexity \( O((\log N)^{2}M(N)) \) for
620 \( r=s \) and \( O(\log N\: M(N)) \) for \( r<s \).
622 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
623 \subsection{Example: \( \pi \)}
625 The Ramanujan series for \( \pi \)
627 \frac{1}{\pi }=\frac{12}{C^{3/2}}\sum^{\infty }_{n=0}
628 \frac{(-1)^n(6n)!(A+nB)}{(3n)!n!^{3}C^{3n}}\]
629 with \( A=13591409 \), \( B=545140134 \), \( C=640320 \) found by the
630 Chudnovsky's \footnote{A special case of \cite{87}, formula (5.5.18),
631 with N=163.} and which is used by the {\sf LiDIA} \cite{95,97,97b} and the Pari
632 \cite{95b} system to compute \( \pi \), is usually written as an algorithm
633 of bit complexity \( O(N^{2}) \). It is, however, possible to apply
634 binary splitting to the sum. Put \( a(n)=A+nB \), \( b(n)=1 \),
635 \( p(0)=1 \), \( q(0)=1 \), and \( p(n)=-(6n-5)(2n-1)(6n-1) \),
636 \( q(n)=n^{3}C^3/24 \) for \( n>0 \). This reduces the complexity to
637 \( O((\log N)^{2}M(N)) \). Although this is theoretically slower than
638 Brent-Salamin's quadratically convergent iteration, which has a bit
639 complexity of \( O(\log N\: M(N)) \), in practice the binary splitted
640 Ramanujan sum is three times faster than Brent-Salamin, at least in the
641 range from \( N=1000 \) bits to \( N=1000000 \) bits.
644 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
645 %%% \subsection{Example: Catalan's constant}
647 \subsection{Example: Catalan's constant \( G \)}
649 A linearly convergent sum for Catalan's constant
651 G:=\sum ^{\infty }_{n=0}\frac{(-1)^{n}}{(2n+1)^{2}}\]
652 is given in \cite{87}, p. 386:
654 G = \frac{3}{8}\sum ^{\infty }_{n=0}\frac{1}{{2n \choose n} (2n+1)^{2}}
655 +\frac{\pi }{8}\log (2+\sqrt{3})
658 The series is summed using binary splitting, putting \( a(n)=1 \),
659 \( b(n)=2n+1 \), \( p(0)=1 \), \( q(0)=1 \), and
660 \( p(n)=n \), \( q(n)=2(2n+1) \) for \( n>0 \). Thus
661 \( G \) can be computed with bit complexity \( O((\log N)^{2}M(N)) \).
663 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
664 \subsection{Example: The Gamma function at rational points}
666 For evaluating \( \Gamma (s) \) for rational \( s \), we first reduce \( s \)
667 to the range \( 1\leq s\leq 2 \) by the formula \( \Gamma (s+1)=s\Gamma (s) \).
668 To compute \( \Gamma (s) \) with a precision of \( N \) bits, choose a
669 positive integer \( x \) with \( xe^{-x}<2^{-N} \). Partial integration lets
673 \Gamma (s)&=& \int ^{\infty }_{0}e^{-t}t^{s-1}dt\\
674 &=& x^{s}e^{-x}\:\sum ^{\infty }_{n=0}
675 \frac{x^{n}}{s(s+1)\cdots (s+n)}
676 +\int^{\infty }_{x}e^{-t}t^{s-1}dt\\
678 The last integral is \( <xe^{-x}<2^{-N} \). The series is evaluated as a
679 hypergeometric function (see above); the number of terms to be summed up is
680 \( O(N) \), since \( x=O(N) \). Thus the entire computation can be done with
681 bit complexity \( O((\log N)^{2}M(N)) \).
687 This result is already mentioned in \cite{76b}.
689 E.~Karatsuba \cite{91} extends this result to \( \Gamma (s) \) for algebraic
692 For \( \Gamma (s) \) there is no checkpointing possible because of the
693 dependency on \( x \) in the binary splitting.
696 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
697 \subsection{Example: The Riemann Zeta value \( \zeta (3) \) \label{zeta}}
699 Recently, Doron Zeilberger's method of ``creative telescoping'' has been
700 applied to Riemann's zeta function at \( s=3 \) (see \cite{96c}), which is
701 also known as {\em Ap{\'e}ry's constant}:
704 \frac{1}{2}\sum ^{\infty }_{n=1}
705 \frac{(-1)^{n-1}(205n^{2}-160n+32)}{n^{5}{2n \choose n}^{5}}
708 This sum consists of three hypergeometric series. Binary splitting can also be
709 applied directly, by putting \( a(n)=205n^{2}+250n+77 \), \( b(n)=1 \),
710 \( p(0)=1 \), \( p(n)=-n^{5} \) for \( n>0 \), and \( q(n)=32(2n+1)^{5} \).
711 Thus the bit complexity of computing \( \zeta (3) \) is
712 \( O((\log N)^{2}M(N)) \).
718 Using this the authors were able to establish a new record in the
719 calculation of \( \zeta (3) \) by computing 1,000,000 decimals \cite{96d}.
720 The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
721 distributing on a cluster of 4 HP 9000/712 machines the same computation
722 required only 2.5 hours. The half hour was necessary for reading the partial
723 results from disk and for recombining them. Again, we have used binary-splitting
724 for recombining: the 4 partial result produced 2 results which were combined
725 to the final 1,000,000 decimals value of \( \zeta (3) \).
727 This example shows the importance of checkpointing. Even if a machine crashes
728 through the calculation, the results of the other machines are still usable.
729 Additionally, being able to parallelise the computation reduced the computing
732 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
733 \section{Evaluation of linearly convergent series of sums}
735 The technique presented in the previous section also applies to all linearly
736 convergent sums of the form
739 U=\sum ^{\infty }_{n=0}\frac{a(n)}{b(n)}\left( \frac{c(0)}{d(0)}+\cdots
740 +\frac{c(n)}{d(n)}\right) \frac{p(0)\cdots p(n)}{q(0)\cdots q(n)}\]
741 where \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \), \( p(n) \),
742 \( q(n) \) are integers with \( O(\log n) \) bits. The most often
743 used case is again that \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \),
744 \( p(n) \), \( q(n) \) are polynomials in \( n \) with
745 integer coefficients.
751 Given two index bounds \( n_{1} \)and \( n_{2} \), consider the partial sums
753 S=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
754 \frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
757 U=\sum _{n_{1}\leq n<n_{2}}\frac{a(n)}{b(n)}
758 \left( \frac{c(n_{1})}{d(n_{1})}+\cdots +\frac{c(n)}{d(n)}\right)
759 \frac{p(n_{1})\cdots p(n)}{q(n_{1})\cdots q(n)}\]
762 As above, we compute the integers \( P={p(n_{1})}\cdots {p(n_{2}-1)} \),
763 \( Q={q(n_{1})}\cdots {q(n_{2}-1)} \), \( B={b(n_{1})}\cdots {b(n_{2}-1)} \),
764 \( T=BQS \), \( D={d(n_{1})}\cdots {d(n_{2}-1)} \),
765 \( C=D\left( \frac{c(n_{1})}{d(n_{1})}+\cdots +
766 \frac{c(n_{2}-1)}{d(n_{2}-1)}\right) \) and \( V=DBQU \).
767 If \( n_{2}-n_{1}<4 \), these
768 are computed directly. If \( n_{2}-n_{1}\geq 4 \), they are computed using
769 {\em binary splitting}: Choose an index \( n_{m} \) in the middle of
770 \( n_{1} \)and \( n_{2} \), compute the components \( P_{l} \), \( Q_{l} \),
771 \( B_{l} \), \( T_{l} \), \( D_{l} \), \( C_{l} \), \( V_{l} \) belonging
772 to the interval \( n_{1}\leq n<n_{m} \), compute the components \( P_{r} \),
773 \( Q_{r} \), \( B_{r} \), \( T_{r} \), \( D_{r} \), \( C_{r} \),
774 \( V_{r} \) belonging to the interval \( n_{m}\leq n<n_{2} \), and set
775 \( P=P_{l}P_{r} \), \( Q=Q_{l}Q_{r} \), \( B=B_{l}B_{r} \),
776 \( T=B_{r}Q_{r}T_{l}+B_{l}P_{l}T_{r} \), \( D=D_{l}D_{r} \),
777 \( C=C_{l}D_{r}+C_{r}D_{l} \) and
778 \( V=D_{r}B_{r}Q_{r}V_{l}+D_{r}C_{l}B_{l}P_{l}T_{r}+D_{l}B_{l}P_{l}V_{r} \).
780 Finally, this algorithm is applied to \( n_{1}=0 \) and
781 \( n_{2}=n_{\max}=O(N) \), and final floating-point divisions
782 \( S=\frac{T}{BQ} \) and \( U=\frac{V}{DBQ} \) are performed.
788 The bit complexity of computing \( S \) and \( U \) with \( N \) bits of
789 precision is \( O((\log N)^{2}M(N)) \).
795 By our assumption that \( a(n) \), \( b(n) \), \( c(n) \), \( d(n) \),
796 \( p(n) \), \( q(n) \) are integers with \( O(\log n) \) bits,
797 the integers \( P \), \( Q \), \( B \), \( T \), \( D \), \( C \),
798 \( V \) belonging to the interval \( n_{1}\leq n<n_{2} \) all have
799 \( O((n_{2}-n_{1})\log n_{2}) \) bits. The rest of the proof is as in the
803 \item [Checkpointing/Parallelising:]~
806 A checkpoint can be easily done by storing the (integer) values of
807 \( n_1 \), \( n_2 \), \( P \), \( Q \), \( B \), \( T \) and additionally
808 \( D \), \( C \), \( V \). Similarly, if \( m \) processors are available,
809 then the interval \( [0,n_{max}] \) can be divided into \( m \) pieces of
810 length \( l = \lfloor n_{max}/m \rfloor \). After each processor \( i \) has
811 computed the sum of its interval \( [il,(i+1)l] \), the partial sums are
812 combined to the final result using the rules described above.
815 \item [Implementation:]~
818 The C++ implementation of the above algorithm is very similar
819 to the previous one. The initialisation is done now by a structure
820 {\tt abpqcd\_series} containing arrays {\tt a}, {\tt b}, {\tt p},
821 {\tt q}, {\tt c} and {\tt d} of multiprecision integers. The values of
822 the arrays at the index \( n \) correspond to the values of the functions
823 \( a \), \( b \), \( p \), \( q \), \( c \) and \( d \) at the integer point
824 \( n \). The (partial) results of the algorithm are stored in the
825 {\tt abpqcd\_series\_result} structure, which now contains 3 new elements
826 ({\tt C}, {\tt D} and {\tt V}).
829 // abpqcd_series is initialised by user
830 struct { bigint *a, *b, *p, *q, *c, *d;
833 // abpqcd_series_result holds the partial results
834 struct { bigint P, Q, B, T, C, D, V;
835 } abpqcd_series_result;
837 void sum_abpqcd(abpqcd_series_result & r,
839 const abpqcd_series & arg)
844 error_handler("summation device",
845 "sum_abpqcd:: n2-n1 should be > 0.");
848 case 1: // the result at the point n1
852 r.T = arg.a[n1] * arg.p[n1];
855 r.V = arg.a[n1] * arg.c[n1] * arg.p[n1];
858 // cases 2, 3, 4 left out for simplicity
860 default: // general case
862 // the left and the right partial sum
863 abpqcd_series_result L, R;
865 // find the middle of the interval
866 int nm = (n1 + n2) / 2;
869 sum_abpqcd(L, n1, nm, arg);
872 sum_abpqcd(R, nm, n2, arg);
878 bigint tmp = L.B * L.P * R.T;
879 r.T = R.B * R.Q * L.T + tmp;
881 r.C = L.C * R.D + R.C * L.D;
882 r.V = R.D * (R.B * R.Q * L.V + L.C * tmp)
883 + L.D * L.B * L.P * R.V;
889 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
890 \subsection{Example: Euler's constant \( C \) \label{eulergamma}}
896 Let \( f(x)=\sum ^{\infty }_{n=0}\frac{x^{n}}{n!^{2}} \) and
897 \( g(x)=\sum ^{\infty }_{n=0}H_{n}\frac{x^{n}}{n!^{2}} \). Then for
898 \( x\rightarrow \infty \),
899 \( \frac{g(x)}{f(x)}=\frac{1}{2}\log x+C+O\left( e^{-4\sqrt{x}}\right) \).
905 The Laplace method for asymptotic evaluation of exponentially growing
906 sums and integrals yields
909 e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}(1+O(x^{-\frac{1}{4}}))\]
912 g(x)=e^{2\sqrt{x}}x^{-\frac{1}{4}}\frac{1}{2\sqrt{\pi }}
913 \left(\frac{1}{2}\log x+C+O(\log x\cdot x^{-\frac{1}{4}})\right)\]
914 On the other hand, \( h(x):=\frac{g(x)}{f(x)} \) satisfies the
915 differential equation
917 xf(x)\cdot h''(x)+(2xf'(x)+f(x))\cdot h'(x)=f'(x)\]
920 h(x)=\frac{1}{2}\log x+C+c_{2}
921 \int ^{\infty }_{x}\frac{1}{tf(t)^{2}}dt=\frac{1}{2}\log x+C+O(e^{-4\sqrt{x}})\]
928 To compute \( C \) with a precision of \( N \) bits, set
929 \[ x=\left\lceil (N+2)\: \frac{\log 2}{4}\right\rceil ^{2} \]
930 and evaluate the series for \( g(x) \) and \( f(x) \) simultaneously,
931 using the binary-splitting algorithm,
932 with \( a(n)=1 \), \( b(n)=1 \), \( c(n)=1 \), \( d(n)=n+1 \),
933 \( p(n)=x \), \( q(n)=(n+1)^{2} \). Let \( \alpha =3.591121477\ldots \)
934 be the solution of the equation \( -\alpha \log \alpha +\alpha +1=0 \). Then
935 \( \alpha \sqrt{x}-\frac{1}{4\log \alpha }\log \sqrt{x}+O(1) \)
936 terms of the series suffice for the relative error to be bounded
943 The bit complexity of this algorithm is \( O((\log N)^{2}M(N)) \).
949 This algorithm was first mentioned in \cite{80}. It is by far
950 the fastest known algorithm for computing Euler's constant.
952 For Euler's constant there is no checkpointing possible because
953 of the dependency on \( x \) in the binary splitting.
956 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
957 \section{Computational results}
959 In this section we present some computational results of our CLN and
960 {\sf LiDIA} implementation of the algorithms presented in this note. We use
961 the official version (1.3) and an experimental version (1.4a) of {\sf LiDIA}.
962 We have taken advantage of {\sf LiDIA}'s ability to replace its kernel
963 (multiprecision arithmetic and memory management) \cite{95,97,97b}, so we were
964 able to use in both cases CLN's fast integer arithmetic routines.
968 The table in Figure \ref{Fig1} shows the running times for the calculation of
969 \( \exp(1) \), \( \log(2) \), \( \pi \), \( C \), \( G \) and \( \zeta(3) \)
970 to precision 100, 1000, 10000 and 100000 decimal digits. The timings are given
971 in seconds and they denote the {\em real} time needed, i.e. system and user
972 time. The computation was done on an Intel Pentium with 133Hz and 32MB of RAM.
976 \begin{tabular}{|l|l|l|l|l|l|l|}
978 D &\( \exp(1) \)&\( \log(2) \)&\( \pi \)&\( C \) &\( G \)&\( \zeta(3) \)\\
981 \( 10^2 \) &0.0005 & 0.0020 &0.0014 & 0.0309 &0.0179 & 0.0027 \\
983 \( 10^3 \) &0.0069 & 0.0474 &0.0141 & 0.8110 &0.3580 & 0.0696 \\
985 \( 10^4 \) &0.2566 & 1.9100 &0.6750 & 33.190 &13.370 & 2.5600 \\
987 \( 10^5 \) &5.5549 & 45.640 &17.430 & 784.93 &340.33 & 72.970 \\
990 \caption{{\sf LiDIA-1.4a} timings of computation of constants using
991 binary-splitting}\label{Fig1}
995 The second table (Figure \ref{Fig2}) summarizes the performance of
996 \( exp(x) \) in various Computer Algebra systems\footnote{We do not list
997 the timings of {\sf LiDIA-1.4a} since these are comparable to those of CLN.}.
998 For a fair comparison of the algorithms, both argument and precision are
999 chosen in such a way, that system--specific optimizations (BCD arithmetic
1000 in Maple, FFT multiplication in CLN, special exact argument handling in
1001 {\sf LiDIA}) do not work. We use \( x = -\sqrt{2} \) and precision
1002 \( 10^{(i/3)} \), with \( i \) running from \( 4 \) to \( 15 \).
1006 \begin{tabular}{|r|l|l|l|l|}
1008 D & Maple & Pari & {\sf LiDIA-1.3} & CLN \\
1011 21 & 0.00090 & 0.00047 & 0.00191 & 0.00075 \\
1013 46 & 0.00250 & 0.00065 & 0.00239 & 0.00109 \\
1015 100 & 0.01000 & 0.00160 & 0.00389 & 0.00239 \\
1017 215 & 0.03100 & 0.00530 & 0.00750 & 0.00690 \\
1019 464 & 0.11000 & 0.02500 & 0.02050 & 0.02991 \\
1021 1000 & 0.4000 & 0.2940 & 0.0704 & 0.0861 \\
1023 2154 & 1.7190 & 0.8980 & 0.2990 & 0.2527 \\
1025 4641 & 8.121 & 5.941 & 1.510 & 0.906 \\
1027 10000 & 39.340 & 39.776 & 7.360 & 4.059 \\
1029 21544 & 172.499 & 280.207 & 39.900 & 15.010 \\
1031 46415 & 868.841 & 1972.184& 129.000 & 39.848 \\
1033 100000 & 4873.829 & 21369.197& 437.000 & 106.990 \\
1036 \caption{Timings of computation of \( \exp(-\sqrt{2}) \)}\label{Fig2}
1040 MapleV R3 is the slowest system in this comparison. This is probably due to
1041 the BCD arithmetic it uses. However, Maple seems to have an asymptotically
1042 better algorithm for \( exp (x) \) for numbers having more than 10000 decimals.
1043 In this range it outperforms Pari-1.39.03, which is the fastest system in the
1044 0--200 decimals range.
1046 The comparison indicating the strength of binary-splitting is between
1047 {\sf LiDIA-1.3} and CLN itself. Having the same kernel, the only
1048 difference is here that {\sf LiDIA-1.3} uses Brent's \( O(\sqrt{n}M(n)) \)
1049 for \( \exp(x) \), whereas CLN changes from Brent's method to a
1050 binary-splitting version for large numbers.
1052 As expected in the range of 1000--100000 decimals CLN outperforms
1053 {\sf LiDIA-1.3} by far. The fact that {\sf LiDIA-1.2.1} is faster
1054 in the range of 200--1000 decimals (also in some trig. functions)
1055 is probably due to a better optimized \( O(\sqrt{n}M(n)) \) method
1058 \subsection {Distributed computing of \( \zeta (3) \)}
1060 Using the method described in \ref{zeta} the authors were the first to
1061 compute 1,000,000 decimals of \( \zeta (3) \) \cite{96d}.
1062 The computation took 8 hours on a Hewlett Packard 9000/712 machine. After
1063 distributing on a cluster of 4 HP 9000/712 machines the same computation
1064 required only 2.5 hours. The half hour was necessary for reading the partial
1065 results from disk and for recombining them. Again, we have used binary-splitting
1066 for recombining: the 4 partial result produced 2 results which were combined
1067 to the final 1,000,000 decimals value of \( \zeta (3) \).
1069 This example shows the importance of checkpointing. Even if a machine crashes
1070 through the calculation, the results of the other machines are still usable.
1071 Additionally, being able to parallelise the computation reduced the computing
1074 \subsection{Euler's constant \( C \)}
1076 We have implemented a version of Brent's and McMillan's algorithm \cite{80} and
1077 a version accelerated by binary-splitting as shown in \ref{eulergamma}.
1079 The computation of \( C \) was done twice on a SPARC-Ultra machine
1080 with 167 MHz and 256 MB of RAM. The first computation using the non-acellerated
1081 version required 160 hours. The result of this computation was then verified
1082 by the binary splitting version in (only) 14 hours.
1084 The first 475006 partial quotients of the continued fraction of \( C \)
1085 were computed on an Intel Pentium with 133 MHz and 32 MB of RAM in 3 hours
1086 using a programm by H. te Riele based on \cite{96e}, which was translated to
1087 {\sf LiDIA} for efficiency reasons. Computing the 475006th
1088 convergent produced the following improved theorem:
1092 \centerline{If \( C \) is a rational number, \(C=p/q\), then \( |q| > 10^{244663} \)}
1096 Details of this computation (including statistics on the partial
1097 quotients) can be found in \cite{98}.
1100 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1101 \section{Conclusions}
1103 Although powerful, the binary splitting method has not been widely used.
1104 Especially, no information existed on the applicability of this method.
1106 In this note we presented a generic binary-splitting summation device for
1107 evaluating two types of linearly convergent series. From this we derived simple
1108 and computationally efficient algorithms for the evaluation of elementary
1109 functions and constants. These algorithms work with {\em exact}
1110 objects, making them suitable for use within Computer Algebra systems.
1112 We have shown that the practical performance of our algorithms is
1113 superior to current system implementations. In addition to existing methods,
1114 our algorithms provide the possibility of checkpointing and parallelising.
1115 These features can be useful for huge calculations, such as those done in
1116 analytic number theory research.
1118 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1121 The authors would like to thank J\"org Arndt, for pointing us to
1122 chapter 10 in \cite{87}. We would also like to thank Richard P. Brent for
1123 his comments and Hermann te Riele for providing us his program for the
1124 continued fraction computation of Euler's constant.
1126 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1127 \bibliography{binsplit}
1128 \bibliographystyle{acm}