[GiNaC-list] Term ordering and compiling C++ code

Mon May 24 20:34:55 CEST 2010

Doug wrote:
        kreckel wrote: 

                ...and second, sharing is currently not pursued
                aggressively! Rather, it is exploited whenever it is
                trivial to do so. (The product rule of differentiation
                is an example where exactly the same terms pop up again
                and again so exploiting sharing comes at no cost.)

        I think the benefit would still be large, even if coverage isn't
        100%.  See below. 

I fully agree with this, at least when it is done to a level that does
not affect overall performance, possibly there is room for improvement
in this direction.

                Well, I think that if the size of generated code is so
                prohibitively large and compiler CSE doesn't help you
                may be better off writing your own algorithm collecting
                similar terms in an associative array. You could then
                artificially introduce temporaries, in order to help the
                compiler. This would boil down to a more aggressive
                variant of GiNaC's subexpression sharing. What do you
                think?

        This is exactly what I have done.  As mentioned, I will post
        that solution here in the next few days.  This works well, but
        is very slow, as it takes something that is sub-linear in the
        size of the expression tree and makes it O(n^2).  

        If GiNaC expression tree nodes had a unique identifier, this
        approach could achieve sub-linear performance.  Specifically, it
        would be linear in the size of the unique subexpressions, not
        the size of the actual expression.  (If you have a lot of
        sharing, your expression can be large, but the number of unique
        expressions can be small.)

I fully agree with that, and I think this should be possible, probably
simple, and won't hurt anybody.

        Even if GiNaC doesn't identify every single instance of a shared
        sub-expression, the benefits could still be large.  In fact, I
        tried an experiment to test this.  I treated ex::gethash() as a
        unique identifier, even though it isn't.  The code I generated
        was wrong, of course, but it was very fast to generate it, and
        very fast to compile and execute.  Thus, whatever number of
        unique expressions were not actually identified as such was
        small enough that this approach would still be quite useful.

I don't know how much would impact in performance the increase of
expression sharing. I'm not aware of the copy on write implementation,
but I think that restricting the effect of copy to the non-shareable
parts, can be probably the single piece needed to increase performance.
The implementation should be straightforward, and the computational
overhead, probably is not that big. I think this performance can be there,
without hurting anybody just providing a mechanism to runtime activate
it. Memory savings could be Huge!!!.

May be it is enough to adopt a programing style to construct expresions
that would increase sharing without further changes within GiNaC,
or any other mechanism is available to increase sharing??.

        Also, FYI, when using the hash value as a unique identifier, I
        only got a few collisions out of thousands of sub-expressions.
        That is, I only found a few expressions with the same hash even
        though they were different expressions.  

Is this correct behavior?. Shouldn't hash values be unique?.

        Thanks,
        Doug

We are looking forward to your code. I would be nice I you could
describe your algorithm, when you finish, in a little more detail.

Are you exporting a single expression or a set of them?.

Thank you,

Javier