things-to-do-in-chk-repository

Robert Collins robertc at robertcollins.net
Tue Nov 11 06:18:49 GMT 2008


On Mon, 2008-11-10 at 20:42 -0500, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > So here are some 'hot' topics in this branch:
> >  - write a RevisionTree.iter_changes(RevisionTree) optimiser
> >    that picks up on the type of the inventory to fast-path
> >    deltas using the CHKInventory guts. (poolie is looking at this)
> 
> For RevisionTree-to-RevisionTree, a richer API that included sha1sum
> would make a lot of sense.

Indeed; I think an interim step would be to do
tree.get_file_sha1sum(file_id, path), which should be fast on dirstate,
and fast in the split inventories (because that entry will be in the
cache from generating the iter_changes output). It may be sufficient to
do just this in fact, for all but the largest merges. 

> >  - get 'st -r -2' to do inventory delta composition - that is to do
> >    wt.iter_changes(basis_tree) and RevisionTree(-1).iter_changes(
> >    RevisionTree(-2)), and combine the results. Combined with the 
> >    optimiser for (RT,RT) above this should lead to very fast diffs
> >    with deep history (both because we don't need to generate a full 
> >    inventory at any point, and because the repository can be optimised
> >    too.
> > 
> > I think the delta composition is an important thing to work on, because
> > it will be difficult to tell if the design is successful until that is
> > working.
> 
> I think the kind of delta composition we need to do is dead simple:
> for WT -> BASIS, generate an inventory entry* of each modified file.
> For BASIS -> REVISION_TREE, generate an inventory entry of each modified
> file.  For file-ids in REVISION_TREE that are missing from WT, copy them
>  from BASIS.  For file-ids in WT that are missing from REVISION_TREE,
> copy them from BASIS.
> 
> Then it should be trivial to generate iter_changes-style ouput from the
> WT and REVISION_TREE inventory entries.
> 
> * We don't need real inventory entries, but we'll want sha1sum so that
> we detect cases where REVISION_TREE and WT have the same content, but
> BASIS is different.

I think a real inventory entry is the simplest thing to do; while making
objects isn't the fastest thing around, because we're dealing with
size(changes) its acceptable (compared to the current system!).

Thanks for analysing the logic in more detail.

-Rob
-- 

GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20081111/fa9a07f2/attachment.pgp 


More information about the bazaar mailing list