fast delta generation in brisbane-core - advice/direction needed

Robert Collins robert.collins at canonical.com
Mon Mar 2 10:44:15 GMT 2009


On Mon, 2009-03-02 at 20:08 +1000, Ian Clatworthy wrote:
> 
> It doesn't seem like that's going to be possible yet?

There isn't any intent to store the deltas pre-calulated; the hash table
in brisbane core is meant to enable calculating any delta more cheaply.

> Whatever delta-ing we're doing seems to be at the storage layer
> (as a space(/time?) optimisation) and lost by the time we
> extract the text?

Well, thats something we need to evaluate. Where is the time going. Is
the fast path for getting inventory deltas being used? What function
shows the most time? Note that TreeDelta is an old, essentially
deprecated API - you may well be calculating a bunch of unneeded stuff
to fully populate those objects vs using inventory._make_delta. Moving
layers around will reduce churn I'm sure.

It's worth noting that git calculates tree deltas fresh everytime, as
far as I know, and has great performance. The reasons for this are
essentially that the changes between two trees are localised to the
directories that changed, and the size of each directory is small
compared to the whole tree. The CHK dict in the bbc formats is meant to
have learnt from that lesson (compared to the flat-file dict our xml
inventories represent) and be allowing O(change) work rather than
O(tree) work.

> Reading through John's emails, we're certainly making great
> progress in terms of:
> 
> * faster text lookup
> * reduced storage size.
> 
> But I fear that delta generation will always be slow on large
> trees like OOo if the algorithm remains:
> 
> 1. get inventory

This is around 60 bytes in bbc.

> 2. get previous inventory

This is another 60 bytes.

> 3. calculate changes.

This should be proportional to the differences.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090302/f4ea358b/attachment.pgp 


More information about the bazaar mailing list