[GiNaC-list] A bad idea: GiNaC as the symbolic manipulation engine in Sage

Mon Aug 11 17:09:30 CEST 2008

On Mon, 11 Aug 2008 18:27:54 +0400
Alexei Sheplyakov <varg at theor.jinr.ru> wrote:

> Hello!
> 
> On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
> 
> > While GiNaC seems to be the best option for Sage,
> 
> GiNaC is a special purpose symbolic manipulation library. The purpose of
> GiNaC is to do "simple" manipulations with "large" expressions using a common
> programming language (as opposed to ill designed, unstable ones as offered
> by most of CASes). So, I don't think GiNaC is a good choice for Sage
> (which attempts to be a general purpose _interactive_ CAS).

I suggested using it in Sage for the symbolic manipulation engine only.
I n your first two sentences, you seem to point out that GiNaC is
good for this purpose.

Sage relies on many other libraries to be "general purpose." It uses
libSingular for multivariate polynomials, flint for univariate
polynomials over integers, and so on. Are you denying that GiNaC is the
best "symbol manipulation engine" I can find?

Your statements also seem to suggest that there is something wrong with
interactive use. GiNaC also provides support for this via ginsh,
pyginac, etc. 

> > I think all these problems can be resolved with some effort
> 
> Don't take me wrong, but I think it's better to put this effort elsewhere.
> I.e. most of "issues" you've mentioned are design decisions.

I am disappointed.

> > and even the speed of basic arithmetic in GiNaC can be improved with
> > modifying it's data structures.
> 
> No matter what data representation you choose, there are always some expensive
> operations. Sure, one can optimize data structures so that some particular
> operation is fast (at the expence of slowing down other operations). So, to
> write a good code one need to be aware of the complexity of operations and use
> expensive operations consciously.

OK, so I chose a bad example for a benchmark, and this was pointed out
before.

When I said modifying data structures, I meant that you can keep the
arguments of an operator in a different structure, instead of a list.
This would probably only make sense on very large expressions, and at
this point, I don't know what the penalty for smaller expressions would
be.

As for different ways of storing the arguments of an operator, we can
look at methods for storing and calculating with polynomials. One
method is to use a hash table to keep the monomials ,Magma, a
commercial system that has the fastest polynomial arithmetic, uses this
method. Another is to use GeoBuckets which is explained here:

 http://portal.acm.org/citation.cfm?id=275394

Singular, which happens to be the fastest free software library for
multivariate polynomial arithmetic uses this. You can also use binary
heaps as suggested here:

http://portal.acm.org/citation.cfm?id=1086847

Michael Monagan and Roman Pierce use this method in the new blazingly
fast multivariate polynomial libray they wrote for maple:

http://www.cecm.sfu.ca/~mmonagan/papers/sdmp19.pdf

<snip>
> 
> > - cln duplicates functionality which is provided by different libraries
> > already distributed with Sage (mpfr, mpir, flint)
> 
> The libraries duplicating CLN functionality [1] typically have several
> serious drawbacks:
>  a. The syntax is really awkward (to put it very mildly). For example,
>     instead of a simple 
> 		z = x+a*y;
> 		one need to write several lines of unreadable code.
>  b. The user is responsible for memory management. This is absolutely
>     inacceptable.
> 
> None of these libraries gives any considerable performance gain, so why
> bother?
>  
> [1] As a matter of fact, CLN existed *long* before the libraries you've
> mentioned were developed.
> 
> > at a considerable speed penalty.
> 
> Could you please give any benchmark backing up this claim?
> 
> > - Creating symbolic functions at runtime is not supported.
> 
> That's a design decision.
> 
> > - IIRC, GiNaC documentation suggests the printing order for the
> > variables is stable between different sessions, regardless of
> > creation order.
> 
> Could you please point out which document (and where) says that?

I might be remembering this wrong. I just put it down as an item from
my notes while writing the wrapper. I don't think writing a pretty
printer which sorts the terms would be hard.

> The order of expressions (in particular, symbols) is not stable. This is
> a price for a fast comparison of expressions. 
> 
> Anyway, since GiNaC is non-interactive, the order of terms doesn't actually
> matter: typically the output of a program is a small expression (quite often
> it is a single number), and the big intermediate expressions are not going
> to be examined by human (except for debugging). 
> 
> > - It takes ages to build
> > 	The packages above took ~25 minutes to build on my desktop machine
> 
> Just for the record: building GiNaC and CLN takes ~30 minutes on my old
> Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use
> as a desktop. That said, patches reducing the build time are wellcome (if
> they don't reduce the performance and don't make the code more complicated).
<snip>

I got the above timing on this machine:

$ cat /proc/cpuinfo  | grep name
model name      : Intel(R) Pentium(R) D CPU 3.40GHz

Cheers,

Burcin