Bazaar-NG vs. Mercurial -- speed comparison

Robert Collins robertc at robertcollins.net
Fri May 19 00:08:22 BST 2006


On Thu, 2006-05-18 at 16:43 -0500, John Arbash Meinel wrote:
> Bryan O'Sullivan wrote:
> > On 5/18/06, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> > 
> >> Sure, but more data can also mean less reading at other points.  For
> >> example, if knits didn't have snapshots, they'd have less data and more
> >> reading would be required to construct an individual version.
> > 
> > Right. This is the tradeoff we make, too. For example, we have
> > snapshots, as already mentioned. But in addition, we store a diff as a
> > diff against the tip revision of a file, not against its parent
> > revision. This usually makes the diff bigger, but we don't have to
> > seek back to the parent in order to reconstruct that part of the file.
> > We trade off file size against seek probability there, which has
> > proven to be a win for performance. Diffing against the parent had a
> > noticeable penalty for us when Chris Mason measured it.
> 
> Because we are doing annotations, we probably cannot do it that way. We
> need to know what lines were introduced by this change versus its
> parent, not just what lines exist.
> You could probably do this after-the-fact by comparing the full-text of
> the child against its parent. But there are some complications when you
> have more than one parent (after a merge), and some of the parent lines
> came from an even older ancestor.
> 
> Interesting to know, though.

Well, I think svn actually has the best compromise here at the moment.
No 'snapshots', but they build a skip-list-like tree they call
skip-deltas. Martin and I calculated they have an average delta count of
2 to rebuild any text in the tree. This decouples the compression data
from the immediate revision graph - you still want to ensure that the
revision diffed against is in the history, to allow bulk copying
optimisations. If you put the annotation data in the delta too (there
are a couple of ways to do this), then what you delta against, and what
your revision graph is, become orthogonal.

From a seek optimisation perspective, I think skip-deltas are very nice
because they will build hot spots in the file: version 0 is needed for
every text (you can define version 0 as "" to make it always local),
then you get a series of groups of versions growing in size - 1, 3, 7,
15, .... Each group is completely selfcontained from the perspective of
needing other deltas, and this is semi-fractal : other than the path
into a group, the group is always self contained at a power of 2.

Rob



-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060519/0c0b78ee/attachment.pgp 


More information about the bazaar mailing list