partial reads of pack indices

Robert Collins robertc at robertcollins.net
Fri Aug 24 06:37:29 BST 2007


I just spent some time brainstorming how to do the topic with Martin.

We've agreed that a reasonable first approach is to:
 - add a parameter to transport.readv which will change the behaviour of
readv to be tuned for the index reading needs.
   On the phone we talked about it just reading more data, but I think
perhaps it should also:
    - sort the readv hunks
    - combine adjacent hunks
   as these are things that will aid performance, and either the index
or readv will need to do them. (Does this make sense Martin?)
 - secondly, the index will do bisection of the file using this api; it
will parse all the data it receives and put them into its parsed record
cache which already exists.

Some analysis:
The combined effect of this will be that we don't request the same
region of an index twice, though there is a (hopefully rare) corner case
where we may ask for a byte X, then a byte X+ the amount of excess data
that the readv read before, meaning we read between these bytes twice.
We will also do more round-trips on indices. On local transports this
will be probably log2(rev count) trips on the first call to
iter_entries, and additional reads whenever the final step is more than
2K  (or 20 records) away from the prior one.

On remote transports it will be log2(rev count) trips for the first call
to iter entries, but the locality of reference window will be 64K or 640
records apart, so we can hope that we won't die with round trips.

I expect to have this up for review Monday.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070824/634e4064/attachment.pgp 


More information about the bazaar mailing list