Optimizing tree building

Fri Jun 8 01:51:56 BST 2007

On Thu, 2007-06-07 at 20:10 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > On Thu, 2007-06-07 at 19:01 -0400, Aaron Bentley wrote:
> >>
> >> I really don't know where this will lead to.  But even if we don't
> >> move
> >> to a blob format for storing revision data, the index problem has to
> >> be
> >> solved.
> > 
> > Because of ghost filling the index is not in strict topological order.
> > So the best we can do is walk backwards from the end in increasing read
> > sizes. That is doable if I recall the results last time I looked.
> 
> For the local-disk case, this probably works.  But for the network case,
> it seems likely to make things slower, and I wouldn't want to be the
> cause of that.

As a data point, launchpad has a 3MB revisions knit index. But most
merges will only need the last 64KB. I think as long as we do not start
too small, we'll win overall because of the reduced data transmission.
The breakeven point is clearly where you spend more time in round trips
than save in reduced downloads. I think doing a 64KB read size would be
fine for SFTP/FTP/HTTP. For the smart server the indexes should never be
read directly anhow - we should be doing something that sends points of
the graph around rather than all the nodes; this allows avoiding large
transfers of data in both directions for long graph-runs, while still
being efficient in small cases. 

> > I think that is a reasonable short term thing to do; I dont think we
> > want to tweak the knit disk format though, not if we think we can get a
> > container based repository up and running.
> 
> I figure if we do get a container-based repository, we'll still need
> indexes, no?  That makes me think it might be more productive to work on
> a new way of handling inventories than to optimize the current one.

We'll still need indexes. What of, and how many, and what the keys are
are still open questions. Changes to inventories - their content and
representation - will also impact what indexing is needed, but I think
the indexing layer should be implementable separately. One important
thing is whether indexes contain graph data or not. Specifically, in
knits the index and the graph are punned. This is good in some respects
and bad in others; I hope we can address this during the design of
indexing.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070608/c72f49d5/attachment.pgp