[MERGE] Transport.readv() closes the file handle
Vincent Ladeuil
v.ladeuil+lp at free.fr
Mon Oct 27 08:43:19 GMT 2008
>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
john> The default implementation of Transport.readv(), and
john> the one used by LocalTransport, is to do a regular
john> ".get()" and then call .seek() and .read() on the
john> resulting file object.
john> This works fine, but it turns out that it leaves the
john> file handle open on Windows.
It leaves it open everywhere ;)
john> Probably the biggest reason is that most of the callers
john> of .readv() actually iterate in lock-step with some
john> other iterator (which names the return values).
I challenge the 'probably', I seem to remember that callers call
the iterator to completion, but _seek_and_read lacks a close
anyway.
john> Because of this, the readv() generator never runs to
john> completion, and thus the file handle doesn't get closed
john> and garbage collected right away.
I can't find any place where that file handle was closed before
your patch so your description is a bit misleading no ?
john> The attached patch changes it so that we know when we
john> are done reading data from the file, and close it.
john> I ran into this because in my "pack retry" code the
john> test was failing on Windows because the .pack file
john> handle was still open, even though I know we were done
john> reading that file.
Since you have a failing test, can you try doing the close in the
generator before a final return, to get a better understanding of
the issue ? Or are you already sure that the generator never runs
to completion ?
john> _seek_and_read() is the default readv() handler. It
john> should actually be safe for all transports we have
john> implemented, because the .close() will be on objects
john> where it is safe. The only danger would be something
john> like HTTP where the returned object could have been the
john> socket we were reading from, but a long time ago we
john> switched to using a cStringIO wrapper because gzip
john> needed a .tell()/.seek() though our tuned_gzip does
john> not. And even so HTTP has a custom readv()
john> implementation anyway.
Sure, no seek on sockets there, the http readv implementation
doesn't requires it, no need to worry.
john> So that only leaves the WebDav plugin, but I'm guessing
john> it inherits from the HTTP code, and not the raw
john> Transport.
You're right. Even more, early in the webdav plugin history, the
reviewer pushed me to merge the plugin main urllib implementation
into bzr core, leaving only the write-part in the plugin ;-)
john> I don't know of any other active Transport
john> implementations (at one point I had an RsyncTransport,
john> IIRC).
Any transport implementation using _seek_and_read will suffer
from the same potential problem and will need your fix anyway.
john> John
john> =:->
john> # Bazaar merge directive format 2 (Bazaar 0.90)
john> # revision_id: john at arbash-meinel.com-20081025003007-5xam89uv2d1b2pdb
john> # target_branch: http://bazaar-vcs.org/bzr/bzr.dev
john> # testament_sha1: edd1edf80ba9de3872e0714f3086d2858dddd8fa
john> # timestamp: 2008-10-24 19:30:18 -0500
john> # source_branch: http://bzr.arbash-meinel.com/branches/bzr/1.9-\
john> # dev/readv_close
john> # base_revision_id: pqm at pqm.ubuntu.com-20081024113829-9geq0uavium22ho6
john> #
john> # Begin patch
john> === modified file 'bzrlib/transport/__init__.py'
john> --- bzrlib/transport/__init__.py 2008-10-06 06:34:36 +0000
john> +++ bzrlib/transport/__init__.py 2008-10-25 00:30:07 +0000
john> @@ -661,6 +661,7 @@
john> """
john> # We are going to iterate multiple times, we need a list
john> offsets = list(offsets)
john> + remaining = len(offsets)
john> sorted_offsets = sorted(offsets)
john> # turn the list of offsets into a stack
john> @@ -688,6 +689,12 @@
john> # Now that we've read some data, see if we can yield anything back
john> while cur_offset_and_size in data_map:
john> this_data = data_map.pop(cur_offset_and_size)
john> + remaining -= 1
john> + if remaining <= 0:
john> + # Close the file handle as we are done yielding data.
I think we need a more strong comment here explaining the issue
if anybody encounters it in the future.
Also, you're relying on the callers consuming all the data, if
one of them ever consume only part of it, your fix is not
triggered.
I don't know of any case for the nominal uses, but if an error
occurs during the readv, the file will remain open anyway.
An alternative may be to keep track of the file handle and close
it whenever another readv (or get ?) is done on the transport.
But this has drawbacks too.
john> + close = getattr(fp, 'close', None)
john> + if close is not None:
StringIO *has* a close() attribute, I think you're overly
cautious here and may hide a potential problem if _seek_and_read
is used with a file handle *without* a one (do you have one class
in mind ?).
Overall, I'd more convinced by this patch if you were more
aggressive: explicitly close before the final generator return,
warn about potential problems in the comment if the readv is not
totally consumed.
But since this fix is needed and better than what we have, I let
you judge whether it's worth the effort.
BB:tweak
Vincent
More information about the bazaar
mailing list