[MERGE] [BUG #165061] Force http.readv() to a maximum number of range requests.

Thu Nov 29 15:58:06 GMT 2007

>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:

    john> Vincent Ladeuil wrote:
    >>>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
    >> 
    robert> Robert Collins has voted resubmit.
    robert> Status is now: Resubmit
    robert> Comment:
    robert> I think its better to make many requests that to ask for more
    robert> data than the fudge factor requires.
    >> 
    >> Agreed.
    >> 
    robert> That is:
    robert> http._readv
    robert> for group in xrange(len(offsets), max_ranges):
    robert> for result in self._do_readv(offsets[group *
    robert> max_ranges:(group+1 * max_ranges)):
    robert> yield result
    >> 
    >> Except two details invalidate this simple solution:
    >> 
    >> - out of order offsets (but jam pointed me to sftp to handle
    >> that, with a risk of buffering too much but the consensus is 
    >> that packs should not suffer from that), 
    >> 
    >> - transient errors handling (this bug showed us that the whole
    >> file can and will be downloaded which should avoided when
    >> possible).
    >> 
    >> So yes, your proposal is the right direction and I'll try to
    >> implement it in one or two steps:
    >> 
    >> - reuse the sftp_readv with slight adaptations*,
    >> 
    >> - rewrite the http response handling to deliver data as soon as
    >> they arrive. This is needed and will massively reduce the
    >> memory consumption.
    >> 
    >> I don't have a lot a free time these days so I'll look at
    >> implementing the first step quickly.
    >> 
    >> Vincent
    >> 
    >> *: needs adaptation since after aggregation the big chunks are
    >> split again to accommodate sftp limits.

    john> I think you could get rid of that in sftp even. It was just a workaround for an
    john> earlier version of paramiko which assumed that your requests would always fit.
    john> so 1.5.2 would need it, but 1.6+ doesn't (IIRC).

Thanks for confirming that (I was a bit surprised when looking at
both sftp.py* and paramiko), I just committed an adapted version
that takes care of out of order offsets and transient errors.

No tests yet and no http response rewriting, so memory
consumption may be worse in some cases which were failing before
the patch.

Onn the other hand, it also fix bug #172071, so I'll continue on
that one.

    Vincent

*: You may want to clean up the code and the comments then ?