[MERGE] Updated sftp_readv

John Arbash Meinel john at arbash-meinel.com
Thu Dec 20 21:24:22 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 

...

> 
>     john> start_block = start_offset = None
>     john> end_block = end_offset = None
>     john> bytes_so_far = 0
>     john> for block_idx, block in enumerate(buffer):
>     john>   next_bytes_so_far = bytes_so_far + len(block)
>     john>   if start_block is None:
>     john>     if next_bytes_so_far > start:
>     john>       start_block = block_idx
>     john>       start_offset = start - bytes_so_far
>     john>   if end_block is None:
>     john>     if next_bytes_so_far > end:
>     john>       end_block = block_idx
>     john>       end_offset = end - bytes_so_far
>     john>       break # We know we are done
> 
>     john> if end_block == start_block:
>     john>   data = buffer[start_block][start_offset:end_offset]
>     john> else:
>     john>   data = ''.join([buffer[start_block][start_offset:]]
>     john>                  + buffer[start_block+1:end_block]
>     john>                  + buffer[end_block][:end_offset])
> 
> 
>     john> Which I think is correct, but it certainly doesn't fall
>     john> under the "simple" definition.
> 
> Eeeek, sure.
> 
> But why are you trying to do that ?
> 
> Because your coalesced offsets are so big that you don't want to
> totally buffer them ?

Yes. But not just that.

> 
> Why not make them smaller then ?

Because then we also lose the ability to combine ranges.

Specifically, we have 2 possible breakpoints.

1) The length of continuous data
2) The 32KB limit for a single SFTP request.

So we might get:
|------|-------|--------|
  r1     r2      r3
|-----------------------|
 coalesced
|------------|----------|
  32KB         32KB

If we changed the coalesce function so that it could only request a range of
<32KB, then we would end up with

a) Files that are longer than 32KB would be unrequestable (ok, we could work
around this one.)
b) In the above scenario, we would have to make 3 requests, for R1 R2 and R3.
If we pretend that all of them are 20KB, then they fit into 2 32KB requests,
(20*3=60) but we could not fit 2 into a single 32KB request. (2*20=40).


> 
> I think the biggest offset a readv can be required to yield can't
> be bigger than a full text revision for a given file, users
> should have machines configured to handle that (they versioned
> the file in the first place don't they ?) ?

Yes. We have to buffer the 10MB for the file we are requesting, as the readv()
API doesn't give us a way to do anything else. (There are still issues of
someone trying to add a 4GB ISO to their repo, but we are a long way from
supporting that.)

> 
> So I'll go the same way than for http: limit the size of the
> coalesced offset, as long as you buffer the requests, that should
> not make any difference in terms of latency.
> 
> In fact doing:
> 
>             cur_coalesced.ranges = new_ranges
> 
> is nothing more than doing that after the fact.
> 
> Or did I miss something ?
> 
>    Vincent
> 

No, we have already placed requests for all the data. The "new_ranges" is just
saying "I've processed this sub-range, stop trying to process it in future passes".

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHat2FJdeBCYSNAAMRArOPAKCwi4oVPCRas2Y4xm05Ed7iSq0IaQCg0Vny
I7GPVBVGd3tKq0jiCbbZpVs=
=dyCe
-----END PGP SIGNATURE-----



More information about the bazaar mailing list