[MERGE] [BUG #165061] Force http.readv() to a maximum number of range requests.

Thu Nov 29 16:07:35 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
>     john> Vincent Ladeuil wrote:
>     >>>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
>     >> 
>     robert> Robert Collins has voted resubmit.
>     robert> Status is now: Resubmit
>     robert> Comment:
>     robert> I think its better to make many requests that to ask for more
>     robert> data than the fudge factor requires.
>     >> 
>     >> Agreed.
>     >> 
>     robert> That is:
>     robert> http._readv
>     robert> for group in xrange(len(offsets), max_ranges):
>     robert> for result in self._do_readv(offsets[group *
>     robert> max_ranges:(group+1 * max_ranges)):
>     robert> yield result
>     >> 
>     >> Except two details invalidate this simple solution:
>     >> 
>     >> - out of order offsets (but jam pointed me to sftp to handle
>     >> that, with a risk of buffering too much but the consensus is 
>     >> that packs should not suffer from that), 
>     >> 
>     >> - transient errors handling (this bug showed us that the whole
>     >> file can and will be downloaded which should avoided when
>     >> possible).
>     >> 
>     >> So yes, your proposal is the right direction and I'll try to
>     >> implement it in one or two steps:
>     >> 
>     >> - reuse the sftp_readv with slight adaptations*,
>     >> 
>     >> - rewrite the http response handling to deliver data as soon as
>     >> they arrive. This is needed and will massively reduce the
>     >> memory consumption.
>     >> 
>     >> I don't have a lot a free time these days so I'll look at
>     >> implementing the first step quickly.
>     >> 
>     >> Vincent
>     >> 
>     >> *: needs adaptation since after aggregation the big chunks are
>     >> split again to accommodate sftp limits.
> 
>     john> I think you could get rid of that in sftp even. It was just a workaround for an
>     john> earlier version of paramiko which assumed that your requests would always fit.
>     john> so 1.5.2 would need it, but 1.6+ doesn't (IIRC).
> 
> Thanks for confirming that (I was a bit surprised when looking at
> both sftp.py* and paramiko), I just committed an adapted version
> that takes care of out of order offsets and transient errors.
> 
> No tests yet and no http response rewriting, so memory
> consumption may be worse in some cases which were failing before
> the patch.
> 
> Onn the other hand, it also fix bug #172071, so I'll continue on
> that one.

I think you need to check your bugs. I'm pretty sure this is a typo:
https://bugs.launchpad.net/bzr/+bug/172071

Especially since the bug is written in Spanish.

You probably meant:
https://bugs.edge.launchpad.net/bzr/+bug/172701

> 
>     Vincent
> 
> *: You may want to clean up the code and the comments then ?
> 

*I* want to do no such thing. :) But I'm happy to approve a patch.

This may not be the best location to have the comment but:
    # The sftp spec says that implementations SHOULD allow reads
    # to be at least 32K. paramiko.readv() does an async request
    # for the chunks. So we need to keep it within a single request
    # size for paramiko <= 1.6.1. paramiko 1.6.2 will probably chop
    # up the request itself, rather than us having to worry about it
    _max_request_size = 32768

Which is right at the beginning of SFTPTransport.

certainly we could add that as step (8) in the _sftp_readv() comment.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHTuPGJdeBCYSNAAMRAkVoAKDIQ2ySBjA5u7094BCsWwZPHuGTyQCglNsj
afrf49C86l6mIO+5PW1IMyg=
=ra6J
-----END PGP SIGNATURE-----