weakrefs and avoiding bad gc cycles

Martin Pool mbp at canonical.com
Wed Jun 1 03:48:08 UTC 2011


On 1 June 2011 13:32, Robert Collins <robertc at robertcollins.net> wrote:

>>  - Although python can gc objects held in reference cycles, it does
>> not seem to always do this very well, and avoiding the cycles can
>> allow objects to be released much faster.  So the cycle is a problem
>> even when all the objects ought to have the same lifetime.
>
> I'm not aware of any python implementation with significant issues
> here. fepy,jython and pypy all have solid gc implementations. CPython
> does full gc (IIRC) based on a combination of bytecodes run + object
> allocations.

That's what I'm wondering about.

> So nothing will leave cycles around indefinitely with *one* exception - __del__.

Right, which we pretty much ban: all of them give a warning that you
should have explicitly closed the object, except, strangely,
SmartMedium (which may be a bug.)

Possible we should even get rid of them in non-debug mode.

> That said, for some things even a fraction of a second will matter.
> One such case is when you have a file open in a directory that is
> going to be deleted - that file -must- be closed before the directory
> cleanup will fail on Windows. Other cases exist around threads and
> sockets. AFAIK all such cases have OS resources involved for them to
> matter.

Right, we also have a rule that anything that uses external resources
must be explicitly closed: relying on gc will both confuse the
collector and also leave the resource hanging for an indefinite time,
which tends to cause failures on Windows.

>> Possible approaches:
>> 1 - Have only what you could call "conceptually downwards" links
>> between objects, so that cycles don't occur: for instance, let the
>> Branch know about its configuration but not vice versa.
>>
>> Sometimes thinking about this constraint actually gives a cleaner
>> factoring with less dependencies between objects.  However, sometimes
>> it is difficult to make this change.  The specific thing Vincent has
>> here is that the branch configuration is guarded on disk by the
>> branch's lock.  (I suppose you could make a refactoring where they
>> both share a lock object which does not have upward links.)
>
> This doesn't guarantee free-order on non-refcount implementations of
> Python. So its insufficient if a free is necessary before some other
> action takes place.

Perhaps I was a bit unclear.  I didn't mean that they should hold a
reference to the lock so that a del method could release the lock,
because that will clearly not give deterministic behaviour. Rather,
the BranchConfig can just hold a reference to the branch's lock object
so that the BranchConfig can delegate its locking to that physical
lock, ie

  class BranchConfig:
    def unlock(self):
      self.branch_lock.unlock()

Vincent's branch instead does:

    def unlock(self):
      self.branch_ref().unlock()


>> From what I know so far, the rules I would try to follow are:
>>
>>  * think about object lifetime; don't hold things unnecessarily
>>  * avoid class relationships that have reference cycles (both for the
>> sake of gc performance and general cleanliness)
>>  * for objects that hold large resources (especially external
>> resources) think about having a way to explicitly release them; and
>> think about deleting in-memory references when you do so
>>  * don't complicate the code to work around python bugs unless you
>> have actual evidence the complication improves things
>
> Broadly +1.
>
> I think a pithy statement of the issue (ignoring __del__ which this
> isn't about AFAICT) is:
>  - Python offers no guarantee on either timeliness or ordering of
> freeing of resources.
>  - So if you need to guarantee free/close/whatever before some other
> operation takes place then we have to manually arrange for that to
> take place.

> And some applications of it for us are:
>  - to avoid memory spikes we have to manually manage the objects which
> hold file texts / compressed groups etc.
>  - to avoid test suite problems on windows we need to coordinate
> server thread closing of files
>  - to avoid open file handle issues during disk operations we need to
> manually close files

Right; I think we already have the latter two under control; the main
question now is what if any systematic proactive thing we should do
about the first.

> I think using weakrefs is diametrically opposite to what is needed -
> not to mention likely slower.

Thanks,
Martin



More information about the bazaar mailing list