[MERGE/RFC] partial delta generation

John Arbash Meinel john at arbash-meinel.com
Tue Feb 3 16:16:23 GMT 2009

Hash: SHA1

Ian Clatworthy wrote:
> For logging directories and multiple files, the key
> technical challenge is fast generation of partial deltas.
> A partial delta for a given set of files and directories
> includes those items, their parents and their children.
> Partial deltas are desirable/needed for two reasons:
> * an empty partial delta means a revision doesn't match
> * display to the user when the -v flag is given.
> There are at least 2 ways of generating a partial delta:
> 1. Create a full delta and filter it (as bzr.dev does now)
> 2. Create partial revision trees and compare those
> And at least two ways of building a partial revision tree:
> a. get the full inventory and filter it
> b. only deserialise the interesting bit of the inventory.
> Here are the times taken for the 3 algorithms on my laptop
> for "log --short -r-100..-1 ChangeLog" on the emacs-merges
> trunk:
> 1.  40.7s
> 2a. 24.6s
> 2b. 13.4s
> And for the last 500 mainline revisions, the times are:
> 1.  3m05.6s
> 2a. 1m48.9s
> 2b. 0m55.3s

My initial feeling is that this is a bit of a hack, and time would be
better spent finishing split inventory. Looking at the code changes, it
is actually rather invasive, as you have a lot of layers you have to go
through before you can get down to what you care about.

I *do* think we want to have an iter_changes() that can pre-filter, as
that can benefit all formats (workingtree comparison, and split-inventory).

I'm also mostly concerned about how you track what parent entries need
to be returned, etc.

All that said, you have implemented this, and it is potentially a win
*today* rather than a theoretical win tomorrow (requiring an upgrade in
the process).

I don't think we want objects that are called RevisionTree that are
partially filled. RT should be made lazy (as it is in split-inv) and it
should have the knowledge about what it does and doesn't contain, rather
than having someone else put only objects-of-interest into it, assuming
that a latter call to RT.iter_changes(other_partial_rt) will do the
right thing.

Again, you've done the work, so it is a sunk cost. It just doesn't feel
like something we want to continue to maintain as time goes forward. But
if the code is clear enough for now, I certainly wouldn't block it being



Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list