From tkluck at infty.nl  Mon Aug 13 01:19:22 2012
From: tkluck at infty.nl (Timo Kluck)
Date: Mon, 13 Aug 2012 01:19:22 +0200
Subject: [GiNaC-devel] Add method ex::symbols for obtaining all symbols
	appearing in an expression
In-Reply-To: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
Message-ID: <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>

Hi,

You may know the CAS Sage [1] which uses GiNaC. It wraps ginac
expressions in Python, and it adds some extra methods of its own.

One of those methods is variables(), which returns a list of symbols
appearing in the expression. It does so by recursively walking down
the expression tree. At one point, that was a bottle neck in my
algorithm, and I realized that it would be very easy to cache those
variables at construction time from the subexpressions.

I think the natural place to implement such a thing would be in GiNaC
itself, so that everyone (not only sage users) may benefit. Is this
planned and/or would you accept such a patch?

Best regards,
Timo Kluck

[1] www.sagemath.org


From alexei.sheplyakov at gmail.com  Mon Aug 13 08:55:05 2012
From: alexei.sheplyakov at gmail.com (Alexei Sheplyakov)
Date: Mon, 13 Aug 2012 09:55:05 +0300
Subject: [GiNaC-devel] Add method ex::symbols for obtaining all symbols
 appearing in an expression
In-Reply-To: <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
Message-ID: <20120813065505.GA10841@vargsbox.jinr.ru>

Hello,

On Mon, Aug 13, 2012 at 01:19:22AM +0200, Timo Kluck wrote:

> One of those methods is variables(), which returns a list of symbols
> appearing in the expression. It does so by recursively walking down
> the expression tree. At one point, that was a bottle neck in my
> algorithm, and I realized that it would be very easy to cache those
> variables at construction time from the subexpressions.
> 
> I think the natural place to implement such a thing would be in GiNaC
> itself, so that everyone (not only sage users) may benefit.

I don't think it's a good idea. This adds quite a non-negligible overhead
to every eval() and expression construction, even if the actual calculation
does not need the variables() (symbols(), or whatever it is).

Besides I don't think caching really helps. Consider the following code:

symbol x, y;
ex e = x + y;
ex g = x - y;
ex e_plus_g = e + g;

One would expect e_plus_g.symbols() to return just x, that means symbols()
needs to scan the whole e_plus_g expression (to find out that it doesn't
contain y), which makes the cache kind of useless.

Best regards,
	Alexei


From kreckel at ginac.de  Mon Aug 13 09:02:19 2012
From: kreckel at ginac.de (Richard B. Kreckel)
Date: Mon, 13 Aug 2012 09:02:19 +0200
Subject: [GiNaC-devel] Add method ex::symbols for obtaining all symbols
 appearing in an expression
In-Reply-To: <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
Message-ID: <5028A67B.60007@ginac.de>

Hi,

On 08/13/2012 01:19 AM, Timo Kluck wrote:
> You may know the CAS Sage [1] which uses GiNaC. It wraps ginac
> expressions in Python, and it adds some extra methods of its own.
>
> One of those methods is variables(), which returns a list of symbols
> appearing in the expression. It does so by recursively walking down
> the expression tree. At one point, that was a bottle neck in my
> algorithm, and I realized that it would be very easy to cache those
> variables at construction time from the subexpressions.
>
> I think the natural place to implement such a thing would be in GiNaC
> itself, so that everyone (not only sage users) may benefit. Is this
> planned and/or would you accept such a patch?

Wouldn't it be better to go one step further and cache those variables 
when the variables() function is first used? This avoids quite a lot of 
overhead in space and time! And then, variables() wouldn't have to be a 
member function, right?

Cheers
   -richy.
-- 
Richard B. Kreckel
<http://www.ginac.de/~kreckel/>


From tkluck at infty.nl  Mon Aug 13 10:31:56 2012
From: tkluck at infty.nl (Timo Kluck)
Date: Mon, 13 Aug 2012 10:31:56 +0200
Subject: [GiNaC-devel] Fwd: Add method ex::symbols for obtaining all symbols
 appearing in an expression
In-Reply-To: <5028b107.2562b40a.5afc.413d@mx.google.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
 <5028A67B.60007@ginac.de> <5028b107.2562b40a.5afc.413d@mx.google.com>
Message-ID: <CAGK+T_mWWNs71s=FQU1oFYAsCpDqSqg6qzcEVv9p22DC3mdckg@mail.gmail.com>

On ma, aug 13, 2012 at 9:02 , Richard B. Kreckel <kreckel at ginac.de> wrote:
>Wouldn't it be better to go one step further and cache those variables when
> the variables() function is first used? This avoids quite a lot of overhead in
> space and time! And then, variables() wouldn't have to be a member
> function, right?

That is probably best, especially in the light of the (x+y) - y
example that Alexei mentioned.

I could easily do that in Sage, but I think it wouldn't help in my
case because I'm constructing so many expressions.

However, I'm thinking it may make sense to do that in GiNaC, because
from what I've seen it has some sort of a copy-on-right mechanism for
subexpressions? As in:

f = sin(x+y)
g = f * cos(x-y)
h = f * tan(z)

will make g and h share the instance (and therefore the cache!) for f?
Please correct me if I've misunderstood that. But if it has, this is
abstracted away in Sage, so caching the variables() function in Sage
would have much less benefit than caching it in GiNaC.

I'm not sure whether this will actually give me any speed benefit in
practice, because even though I'm using very similar subexpressions
over and over again, I'm not sure if I could construct them as the
same object without writing really awkward code. I'm interested in
your opinions, though.

So my next question is: would you accept a patch implementing a
recursive, cached version of symbols(), for Sage to call directly, and
then have responsibility for optimizing / caching that going to the
level of GiNaC?

I would normally think that it would be a method of ex (and basic).
Where would you say the cache lives if it where a global function?

Timo


From kreckel at ginac.de  Tue Aug 14 09:54:17 2012
From: kreckel at ginac.de (Richard B. Kreckel)
Date: Tue, 14 Aug 2012 09:54:17 +0200
Subject: [GiNaC-devel] Fwd: Add method ex::symbols for obtaining all
 symbols appearing in an expression
In-Reply-To: <CAGK+T_mWWNs71s=FQU1oFYAsCpDqSqg6qzcEVv9p22DC3mdckg@mail.gmail.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
 <5028A67B.60007@ginac.de> <5028b107.2562b40a.5afc.413d@mx.google.com>
 <CAGK+T_mWWNs71s=FQU1oFYAsCpDqSqg6qzcEVv9p22DC3mdckg@mail.gmail.com>
Message-ID: <502A0429.60301@ginac.de>

On 08/13/2012 10:31 AM, Timo Kluck wrote:
> On ma, aug 13, 2012 at 9:02 , Richard B. Kreckel<kreckel at ginac.de>  wrote:
>> Wouldn't it be better to go one step further and cache those variables when
>> the variables() function is first used? This avoids quite a lot of overhead in
>> space and time! And then, variables() wouldn't have to be a member
>> function, right?
>
> That is probably best, especially in the light of the (x+y) - y
> example that Alexei mentioned.
>
> I could easily do that in Sage, but I think it wouldn't help in my
> case because I'm constructing so many expressions.
>
> However, I'm thinking it may make sense to do that in GiNaC, because
> from what I've seen it has some sort of a copy-on-right mechanism for
> subexpressions? As in:
>
> f = sin(x+y)
> g = f * cos(x-y)
> h = f * tan(z)
>
> will make g and h share the instance (and therefore the cache!) for f?

Yes, that is right.

> Please correct me if I've misunderstood that. But if it has, this is
> abstracted away in Sage, so caching the variables() function in Sage
> would have much less benefit than caching it in GiNaC.
>
> I'm not sure whether this will actually give me any speed benefit in
> practice, because even though I'm using very similar subexpressions
> over and over again, I'm not sure if I could construct them as the
> same object without writing really awkward code. I'm interested in
> your opinions, though.

Hmm, note that even if you cannot construct them as one object they 
might end up as one object after a while (c.f. the private ex::share(ex) 
method). Whether that helps or doesn't help in your case depends on the 
actual use pattern.

There remains Alexei's worry that this is likely to substantially slow 
down all applications.

Optimization is full of surprises. In any case, you should experiment 
with different strategies and carefully observe their impact!

> So my next question is: would you accept a patch implementing a
> recursive, cached version of symbols(), for Sage to call directly, and
> then have responsibility for optimizing / caching that going to the
> level of GiNaC?

Well, any patch would have to go into Pynac, in order to be of advantage 
in Sage. Pynac is a fork of GiNaC. (In the past, we've repeatedly fixed 
Sage bugs in GiNaC and suggested patches for Pynac and vice-versa, but 
that doesn't mean that changes in GiNaC propagate automatically into 
Sage. At least I'm not aware of that.)

> I would normally think that it would be a method of ex (and basic).
> Where would you say the cache lives if it where a global function?

It should be possible to equip that global function with a static 
hashmap ex -> symbols. Of course, that opens a memory leak because the 
map is never cleaned up. So, as an alternative, one could maintain that 
map only while it can really speed up things: store it in an object and 
delete it when that object is destructed.

Cheers
   -richy.
-- 
Richard B. Kreckel
<http://www.ginac.de/~kreckel/>


From tkluck at infty.nl  Tue Aug 14 11:00:03 2012
From: tkluck at infty.nl (Timo Kluck)
Date: Tue, 14 Aug 2012 11:00:03 +0200
Subject: [GiNaC-devel] Fwd: Add method ex::symbols for obtaining all
 symbols appearing in an expression
In-Reply-To: <502A0429.60301@ginac.de>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
 <5028A67B.60007@ginac.de> <5028b107.2562b40a.5afc.413d@mx.google.com>
 <CAGK+T_mWWNs71s=FQU1oFYAsCpDqSqg6qzcEVv9p22DC3mdckg@mail.gmail.com>
 <502A0429.60301@ginac.de>
Message-ID: <CAGK+T_nstTfk-YBn=GVs2fwvOtwob6hsJKKjoD+rePd96Ro4_w@mail.gmail.com>

2012/8/14 Richard B. Kreckel <kreckel at ginac.de>:
> There remains Alexei's worry that this is likely to substantially slow down
> all applications.
I think his worry related to my initial suggestion for doing this at
construction time. I think that just adding a member function
implementing its own cache shouldn't add overhead to applications not
using it. (The only difference would be the constructor having to
initialize the cache to NULL).
>
> Well, any patch would have to go into Pynac, in order to be of advantage in
> Sage. Pynac is a fork of GiNaC. (In the past, we've repeatedly fixed Sage
> bugs in GiNaC and suggested patches for Pynac and vice-versa, but that
> doesn't mean that changes in GiNaC propagate automatically into Sage. At
> least I'm not aware of that.)
Thanks for pointing that out; I was under the impression that Pynac
was a Python wrapper for Ginac. I might just have to look at Pynac
then. Do you happen to know what the reason was for forking?

>> I would normally think that it would be a method of ex (and basic).
>> Where would you say the cache lives if it where a global function?
> It should be possible to equip that global function with a static hashmap ex
> -> symbols. Of course, that opens a memory leak because the map is never
> cleaned up. So, as an alternative, one could maintain that map only while it
> can really speed up things: store it in an object and delete it when that
> object is destructed.
There would be no memory leak if the cache were to live in the object.


From kreckel at ginac.de  Wed Aug 22 00:51:39 2012
From: kreckel at ginac.de (Richard B. Kreckel)
Date: Wed, 22 Aug 2012 00:51:39 +0200
Subject: [GiNaC-devel] Fwd: Add method ex::symbols for obtaining all
 symbols appearing in an expression
In-Reply-To: <CAGK+T_nstTfk-YBn=GVs2fwvOtwob6hsJKKjoD+rePd96Ro4_w@mail.gmail.com>
References: <50283963.6256b40a.1fd5.ffffb000@mx.google.com>
 <CAGK+T_=fRZmUEE=YN0Ugdq8XZtZH7K6tPTrFPB5i9Kg-hJQO6A@mail.gmail.com>
 <5028A67B.60007@ginac.de> <5028b107.2562b40a.5afc.413d@mx.google.com>
 <CAGK+T_mWWNs71s=FQU1oFYAsCpDqSqg6qzcEVv9p22DC3mdckg@mail.gmail.com>
 <502A0429.60301@ginac.de>
 <CAGK+T_nstTfk-YBn=GVs2fwvOtwob6hsJKKjoD+rePd96Ro4_w@mail.gmail.com>
Message-ID: <503410FB.2050202@ginac.de>

Hi!

On 08/14/2012 11:00 AM, Timo Kluck wrote:
> 2012/8/14 Richard B. Kreckel<kreckel at ginac.de>:
>> There remains Alexei's worry that this is likely to substantially slow down
>> all applications.
> I think his worry related to my initial suggestion for doing this at
> construction time. I think that just adding a member function
> implementing its own cache shouldn't add overhead to applications not
> using it. (The only difference would be the constructor having to
> initialize the cache to NULL).

That is correct.

How would you manage that list of symbols? One may have to take care 
about when to destroy it, in order not to break refcounting and create a 
memory leak, I suppose.

> Do you happen to know what the reason was for forking?

Sage already had its own number system and didn't want to bring another 
one (CLN). Also, there wasn't much interest in some of the special 
algebra for high energy physics. No worries.

> There would be no memory leak if the cache were to live in the object.

That's correct, too.

Still, you should carefully benchmark your proposal to see if it really 
helps. If it does, and if a .symbols() MF is really a bottleneck in some 
applications, then, well, why not add it to GiNaC?

And, please, discuss this with the Pynac developers, too, for the 
obvious reasons.

Cheers
   -richy.
-- 
Richard B. Kreckel
<http://www.ginac.de/~kreckel/>