[MERGE] Transport.readv() closes the file handle

Vincent Ladeuil v.ladeuil+lp at free.fr
Mon Oct 27 08:43:19 GMT 2008


>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:

    john> The default implementation of Transport.readv(), and
    john> the one used by LocalTransport, is to do a regular
    john> ".get()" and then call .seek() and .read() on the
    john> resulting file object.

    john> This works fine, but it turns out that it leaves the
    john> file handle open on Windows.

It leaves it open everywhere ;)

    john> Probably the biggest reason is that most of the callers
    john> of .readv() actually iterate in lock-step with some
    john> other iterator (which names the return values).

I challenge the 'probably', I seem to remember that callers call
the iterator to completion, but _seek_and_read lacks a close
anyway.

    john> Because of this, the readv() generator never runs to
    john> completion, and thus the file handle doesn't get closed
    john> and garbage collected right away.

I can't find any place where that file handle was closed before
your patch so your description is a bit misleading no ?

    john> The attached patch changes it so that we know when we
    john> are done reading data from the file, and close it.

    john> I ran into this because in my "pack retry" code the
    john> test was failing on Windows because the .pack file
    john> handle was still open, even though I know we were done
    john> reading that file.

Since you have a failing test, can you try doing the close in the
generator before a final return, to get a better understanding of
the issue ? Or are you already sure that the generator never runs
to completion ?

    john> _seek_and_read() is the default readv() handler. It
    john> should actually be safe for all transports we have
    john> implemented, because the .close() will be on objects
    john> where it is safe. The only danger would be something
    john> like HTTP where the returned object could have been the
    john> socket we were reading from, but a long time ago we
    john> switched to using a cStringIO wrapper because gzip
    john> needed a .tell()/.seek() though our tuned_gzip does
    john> not. And even so HTTP has a custom readv()
    john> implementation anyway.

Sure, no seek on sockets there, the http readv implementation
doesn't requires it, no need to worry.

    john> So that only leaves the WebDav plugin, but I'm guessing
    john> it inherits from the HTTP code, and not the raw
    john> Transport.

You're right. Even more, early in the webdav plugin history, the
reviewer pushed me to merge the plugin main urllib implementation
into bzr core, leaving only the write-part in the plugin ;-)

    john> I don't know of any other active Transport
    john> implementations (at one point I had an RsyncTransport,
    john> IIRC).

Any transport implementation using _seek_and_read will suffer
from the same potential problem and will need your fix anyway.

    john> John
    john> =:->

    john> # Bazaar merge directive format 2 (Bazaar 0.90)
    john> # revision_id: john at arbash-meinel.com-20081025003007-5xam89uv2d1b2pdb
    john> # target_branch: http://bazaar-vcs.org/bzr/bzr.dev
    john> # testament_sha1: edd1edf80ba9de3872e0714f3086d2858dddd8fa
    john> # timestamp: 2008-10-24 19:30:18 -0500
    john> # source_branch: http://bzr.arbash-meinel.com/branches/bzr/1.9-\
    john> #   dev/readv_close
    john> # base_revision_id: pqm at pqm.ubuntu.com-20081024113829-9geq0uavium22ho6
    john> # 
    john> # Begin patch
    john> === modified file 'bzrlib/transport/__init__.py'
    john> --- bzrlib/transport/__init__.py	2008-10-06 06:34:36 +0000
    john> +++ bzrlib/transport/__init__.py	2008-10-25 00:30:07 +0000
    john> @@ -661,6 +661,7 @@
    john>          """
    john>          # We are going to iterate multiple times, we need a list
    john>          offsets = list(offsets)
    john> +        remaining = len(offsets)
    john>          sorted_offsets = sorted(offsets)
 
    john>          # turn the list of offsets into a stack
    john> @@ -688,6 +689,12 @@
    john>              # Now that we've read some data, see if we can yield anything back
    john>              while cur_offset_and_size in data_map:
    john>                  this_data = data_map.pop(cur_offset_and_size)
    john> +                remaining -= 1
    john> +                if remaining <= 0:
    john> +                    # Close the file handle as we are done yielding data.

I think we need a more strong comment here explaining the issue
if anybody encounters it in the future.

Also, you're relying on the callers consuming all the data, if
one of them ever consume only part of it, your fix is not
triggered.

I don't know of any case for the nominal uses, but if an error
occurs during the readv, the file will remain open anyway.

An alternative may be to keep track of the file handle and close
it whenever another readv (or get ?) is done on the transport.

But this has drawbacks too.

    john> +                    close = getattr(fp, 'close', None)
    john> +                    if close is not None:

StringIO *has* a close() attribute, I think you're overly
cautious here and may hide a potential problem if _seek_and_read
is used with a file handle *without* a one (do you have one class
in mind ?).

Overall, I'd more convinced by this patch if you were more
aggressive: explicitly close before the final generator return,
warn about potential problems in the comment if the readv is not
totally consumed.

But since this fix is needed and better than what we have, I let
you judge whether it's worth the effort.

BB:tweak

        Vincent



More information about the bazaar mailing list