[GiNaC-list] [PATCH] fix bug/feature causing non-deterministic output

Sun May 25 00:30:54 CEST 2014

Hi,

Thanks for your comments, I reply inline below.

On 24 May 2014 10:11, Alexei Sheplyakov wrote:
> Non-deterministic term order is OK: GiNaC is designed for processing expressions
> consisting of (many) millions of terms (expressions of this size are very common
> in high energy physics). The intermediate expressions (which typically are
> the most sizable) are not going to be printed other than for debugging purposes.
> Also the output of GiNaC is mostly processed by other programs rather than
> a human users.

I happen to know what kind of expressions one can get in high energy
physics quite well. Determinism has nothing to do with humans reading
output, but with reproducibility of results. Reproducibility is an
important property of any computational program, even Monte-Carlo
simulations are deterministic. Being able to compare two 10G files of
expressions by simple diff or md5sum/etc is rather convenient.

> This is just plain wrong. The hash seed should be computed from the type
> information of the *current* object rather than from the type of the object
> which initiated the very first call to calchash(). The idea is that object
> having a different type are typically different so they should have different
> hash values (even if their operands are the same).

You are right that it isn't doing what was intended and any class
which doesn't overload calchash will use the same seed, which is a
mistake. It is of course arguable whether different classes must use
different seed, because the only drawback of a hash collision for
different types is one extra typeid call in basic::compare. So
ensuring lack of collisions for different types might be not really
that important, but it is also easy so why not do it.

> In theory one can use a lookup table in order to not recompute the hash seed for the same type
> twice (GiNaC uses a different approach - make the hash function fast enough
> so caching gives no advantages).

If that approach gave the same term order every run of a program
everyone would be happy.

> In general any patch which makes the hash function substantially slower for
> nothing good at all will be rejected.

I'm attaching a revised version of a patch, it adds virtual functions
to every derived class of basic, which effectively cache the class
name hashes in local static variables. It is rather small (if
noticeable) price to pay for greater convenience for users.

I kept old hash in basic::gethashseed() for any class that does not
overload it, though it is best not to have such classes. If someone
derives from a basic descendant, but doesn't overload gethashseed()
they will share the hash_seed which will result in an extra call to
typeid in basic::compare (doubtfully noticeable). make_hash_seed is
moved to utils.h

> If someone needs a predictable output,
> please write the corresponding print_context.

Yeah, it always puzzled me why GiNaC doesn't have a sorted_print
function but suggests users to write their own.

Best,
Valery
-------------- next part --------------
A non-text attachment was scrubbed...
Name: master-Make-hash-values-deterministic-rev2.patch
Type: text/x-patch
Size: 25688 bytes
Desc: not available
URL: <http://www.cebix.net/pipermail/ginac-list/attachments/20140525/30978e56/attachment-0001.patch>