weakrefs and avoiding bad gc cycles
Martin Pool
mbp at canonical.com
Wed Jun 1 03:48:08 UTC 2011
On 1 June 2011 13:32, Robert Collins <robertc at robertcollins.net> wrote:
>> - Although python can gc objects held in reference cycles, it does
>> not seem to always do this very well, and avoiding the cycles can
>> allow objects to be released much faster. So the cycle is a problem
>> even when all the objects ought to have the same lifetime.
>
> I'm not aware of any python implementation with significant issues
> here. fepy,jython and pypy all have solid gc implementations. CPython
> does full gc (IIRC) based on a combination of bytecodes run + object
> allocations.
That's what I'm wondering about.
> So nothing will leave cycles around indefinitely with *one* exception - __del__.
Right, which we pretty much ban: all of them give a warning that you
should have explicitly closed the object, except, strangely,
SmartMedium (which may be a bug.)
Possible we should even get rid of them in non-debug mode.
> That said, for some things even a fraction of a second will matter.
> One such case is when you have a file open in a directory that is
> going to be deleted - that file -must- be closed before the directory
> cleanup will fail on Windows. Other cases exist around threads and
> sockets. AFAIK all such cases have OS resources involved for them to
> matter.
Right, we also have a rule that anything that uses external resources
must be explicitly closed: relying on gc will both confuse the
collector and also leave the resource hanging for an indefinite time,
which tends to cause failures on Windows.
>> Possible approaches:
>> 1 - Have only what you could call "conceptually downwards" links
>> between objects, so that cycles don't occur: for instance, let the
>> Branch know about its configuration but not vice versa.
>>
>> Sometimes thinking about this constraint actually gives a cleaner
>> factoring with less dependencies between objects. However, sometimes
>> it is difficult to make this change. The specific thing Vincent has
>> here is that the branch configuration is guarded on disk by the
>> branch's lock. (I suppose you could make a refactoring where they
>> both share a lock object which does not have upward links.)
>
> This doesn't guarantee free-order on non-refcount implementations of
> Python. So its insufficient if a free is necessary before some other
> action takes place.
Perhaps I was a bit unclear. I didn't mean that they should hold a
reference to the lock so that a del method could release the lock,
because that will clearly not give deterministic behaviour. Rather,
the BranchConfig can just hold a reference to the branch's lock object
so that the BranchConfig can delegate its locking to that physical
lock, ie
class BranchConfig:
def unlock(self):
self.branch_lock.unlock()
Vincent's branch instead does:
def unlock(self):
self.branch_ref().unlock()
>> From what I know so far, the rules I would try to follow are:
>>
>> * think about object lifetime; don't hold things unnecessarily
>> * avoid class relationships that have reference cycles (both for the
>> sake of gc performance and general cleanliness)
>> * for objects that hold large resources (especially external
>> resources) think about having a way to explicitly release them; and
>> think about deleting in-memory references when you do so
>> * don't complicate the code to work around python bugs unless you
>> have actual evidence the complication improves things
>
> Broadly +1.
>
> I think a pithy statement of the issue (ignoring __del__ which this
> isn't about AFAICT) is:
> - Python offers no guarantee on either timeliness or ordering of
> freeing of resources.
> - So if you need to guarantee free/close/whatever before some other
> operation takes place then we have to manually arrange for that to
> take place.
> And some applications of it for us are:
> - to avoid memory spikes we have to manually manage the objects which
> hold file texts / compressed groups etc.
> - to avoid test suite problems on windows we need to coordinate
> server thread closing of files
> - to avoid open file handle issues during disk operations we need to
> manually close files
Right; I think we already have the latter two under control; the main
question now is what if any systematic proactive thing we should do
about the first.
> I think using weakrefs is diametrically opposite to what is needed -
> not to mention likely slower.
Thanks,
Martin
More information about the bazaar
mailing list