weakrefs and avoiding bad gc cycles

vila v.ladeuil+lp at free.fr
Wed Jun 1 07:01:02 UTC 2011


>>>>> Martin Pool <mbp at canonical.com> writes:


    > The issue comes up in
    > <https://code.launchpad.net/~vila/bzr/config-lock-remote/+merge/62418>
    > of avoiding reference cycles between objects, and also for example
    > <https://code.launchpad.net/~gz/bzr/cleanup_testcases_by_collection_613247/+merge/32711>.

These are two different cases IMHO, they are both related to gc cycles
but their respective scopes are totally different. I'll focus on
clarifying the former here.

<snip/>

    >  - Although python can gc objects held in reference cycles, it
    >  does not seem to always do this very well, and avoiding the
    >  cycles can allow objects to be released much faster.  So the
    >  cycle is a problem even when all the objects ought to have the
    >  same lifetime.

Without presuming how good or bad python is, I think that avoiding
cycles when it's obvious we're creating one doesn't hurt.

    > I'd like to know how bad people think the second one is.  In that
    > review, Vincent seems to imply that we should always preemptively
    > avoid cycles.

No. I never said nor implied *always*, this implies a huge amount of
work that I don't intend to even start with and certainly not ask anyone
else to start either.

    > If we have to do something approaching manual memory management in
    > Python that's pretty disappointing,

Indeed.

Without telling my personal story regarding memory management, let's
just say that I had my share of malloc/free bugs and that I welcomed
using gc-based languages. The (almost only) warning I got with gc is:
beware of cycles. So I kept the habit of avoiding them when possible be
it by design or by avoiding the obvious ones (unfortunately only a few
are obvious :-/).

<snip/>

    > The specific thing Vincent has here is that the branch
    > configuration is guarded on disk by the branch's lock.

Yes, that's the assumption I started with.

    > (I suppose you could make a refactoring where they both share a
    > lock object which does not have upward links.)

I thought about that but avoid it for several reasons:

1) It would require the caller to know about what object handles the
   locking (I vaguely remember old bzr formats using a different lock
   object than the current formats and didn't look at what foreign
   formats does or will do or can do, etc). This also means more
   parameters to pass at init time, more analysis for tree and repo,
   etc. I was too lazy for that.

2) I could have work around the former by just requiring methods for
   lock_write() and unlock(), but doh, that won't address the cycle only
   hide it a bit better ;) (and still more parameters etc)

So, without thinking further[1], I just used a weakref which to me is just
what Wikipedia says about it:

     a weak reference is a reference that does not protect the
     referenced object from collection by a garbage collector.

The config can't (and doesn't need to) exist without its branch so the
branch will always exist when needed.

End of the story, perfect match for a weakref (thought I....).

I had no idea this simple application of "practicality beats purity"
will trigger such reactions and I'm kind of sad about that.

<snip/>

    > 3- Manually use weakrefs, as Vincent's code does.  This is a bit
    > different from how we're used what you could call "semantic"
    > weakrefs in the past, where the code will use an object if it's
    > present but doesn't care if it's retained.  In this setup, the
    > object _must_ be present, and it really should be a real
    > reference, but we're using a weakref as a workaround for a Python
    > bug.

I don't see that as a bug. I just think that creating a cycle when there
are alternatives is just far from ideal (too say the least and stay
polite).

Now for this specific case there are alternatives:

1) Forget about the locking entirely. After all, the branch will be
   locked right ? So the config is already protected.

2) Create the cycle, let python deal with it and let people come back to
   more important stuff.

3) Consider that in the final implementation the cycle can be broken
   when unlocking the branch by deleting the config attribute (and get
   rid of the branch reference at this point).

In this proposal (and the related ones), I focused on implementing only
what I needed to make reviews easier. Such discussions tell me that this
doesn't work. That is far more concerning that using weakrefs or not to
me.

I hope this clarifies my point of view on the issue,

        Vincent

[1]: Ok, I *thought* further and looked at where weakref were used in
     the bzr code base. I came across SmartClientHTTPMedium, asserted
     Smart == spiv and thought: "I have a previous usage, I'm fine
     reusing it for the exact same reasons". Case closed, next.

     Time passed, reviews went on. 
     
     Then spiv commented 'using weakrefs to avoid cycles is a new idiom'...
     
     Wtf ? I fired qblame bzrlib/transport/http/__init__.py
     and... right, this has been introduced in 2008, plenty of time to
     trigger bugs or whatever, and who did that ?
     
     Err, *who* did that ?
     
     Damn it :) Ok, <bad faith> whatever </bad faith>), my point is that I
     was in *good* faith when *reusing* it :-D
     
     This doesn't mean I'm against discussing it *now* (quite the contrary),
     just that I wasn't playing with a new toy when I used it, I *really*
     thought it was just business as usual.




More information about the bazaar mailing list