[MERGE] [BUG #165061] Force http.readv() to a maximum number of range requests.

John Arbash Meinel john at arbash-meinel.com
Tue Nov 27 12:44:19 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
>     john> In: https://bugs.launchpad.net/bzr/+bug/165061
>     john> Robert noticed that when working with his pack branch, it actually ended
>     john> up downloading the same data multiple times.
>     john> It happened because with the first pull, it tried to split up the request
>     john> into to many sections, which triggered the 'issue 1 range for all readv()'
>     john> code path. And I believe there is a small pack issue about requesting a
>     john> piece at the beginning of the file, as well as all the signatures at the
>     john> end of the file.
> 
>     john> Anyway, this patch should make Bazaar friendlier when trying to download a
>     john> large subset of a pack repository. If we were downloading the whole thing,
>     john> the ranges would collapse down into a single range.
> 
>     john> Robert noticed that he was getting a collapse down into 671 ranges. This
>     john> code should force our requests to always be less than 200 ranges.
> 
> As long as _max_readv_combine get the right value. And when it
> get the right value, it already collapse to *2* ranges.
> 
>     john> This code also has the advantage that it should only
>     john> collapse nearby ranges. So we don't end up putting the
>     john> request at the beginning of the file into the same
>     john> request at the end of the file. Instead we keep
>     john> grouping the close-together changes.
> 
> Already the case.
> 
>     john> It is inefficient in that it keeps retrying the
>     john> collapse function with larger fudge factors, but this
>     john> code path should generally be triggered rarely anyway.
> 
> But unfortunately will loop endlessly.
> 
>     Vincent
> 

Actually, I'm wondering if we want to just get rid of the "_max_readv_combined"
parameter.

It was originally there because I felt we might want to limit our requests.
(Request 128k, start processing it, request the next, etc.)

It might make a little bit of sense on the local transport, since latency is
pretty low. We don't really do async IO, other than the time the disk spends
spinning in-between requests.

For http, what we want is to have sub-ranges that we can parse apart while the
data is streaming in. While we could make a really fancy response parser, that
can figure out "this is the first part of the next readv request, it is ready
to be returned". I'm guessing we will actually parse each sub-range separately,
and then break that up. So having 1 big range means we buffer all of that
before we start parsing.

I changed the implementation slightly, and incorporated your patch.

I think your simple fix is definitely correct and should be incorporated.

My concern is that it doesn't really fix the bug. It works when you download
tip, because it will try to download 99.9% of everything from the file.

But what happens if you try to download revision 1000, or something like that.
Where the data is going to be broken up a bit more.

Or in the case of my repositories where I have plugins in the same repo as my
bzr data. The autopack code pays little attention to ancestry of revisions. So
it will happily insert bzrtools revno 100 in between bzr.dev 1000 and 1001. We
certainly need to improve the packer to do more locality of reference stuff.

Thanks for catching the endless looping bug.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHTBEiJdeBCYSNAAMRArqCAKDD1aedtBdBN2OU7nFmGQrTYwSNBgCfYJIs
NYZJmEA4Yyn/6H2Mp+lLloE=
=3pk2
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: extra_range_collapse_165061.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071127/dce8fc07/attachment.diff 


More information about the bazaar mailing list