backward diffs in knits?

Aaron Bentley aaron.bentley at utoronto.ca
Tue Apr 10 14:29:35 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthieu Moy wrote:
> I believe the deltas would be smaller in a mixed snapshot+delta
> situation.
> 
> Assuming you mostly add lines, 
> 
> * With forward deltas : each added line appears once in the delta
>   introducing it, and once in the snapshot appearing next.

And once in every subsequent snapshot until the line is deleted.

> * With backward deltas : each added line appears once in the snapshot,
>   and only the information that this line has to be removed appears in
>   the delta for A.

So let's say our interval for snapshots is 10, and we're looking at a
range of 15 revisions.

In 1, we have ABCDE
In 2, we have ABCDEFGH

All the other revisions are no-ops.

For a forward delta, this produces
A snapshot of "ABCDE" at 1
An insert of "FGH: at 2
A snapshot of "ABCDEFGH" at 11

For a backward delta, this produces
A snapshot of "ABCDEFGH" at 15
A snapshot of "ABCDEFGH" at 5
A deletion at 1

So in this case, there's no savings in content.

Stretch it out to 21 revisions, and the picture changes a bit:

forward:
A snapshot of "ABCDE" at 1
An insert of "FGH" at 2
A snapshot of "ABCDEFGH" at 11
A snapshot of "ABCDEFGH" at 21

backward:
A snapshot of "ABCDE" at 1
A snapshot of "ABCDEFGH" at 11
A snapshot of "ABCDEFGH" at 21

You can extend this example in either direction, and backwards deltas
retain their three-letter advantage (but the relative advantage decreases).

It seems like there's some win, but I'm not sure whether it's enough to
justify a "repack" command.  I guess more study is required, just like
multi-parent diffs.

> IIRC, CVS has the last revision as a fulltext snapshot, and the others
> as backward delta.

That matches what I've heard.  The fact that CVS does something is not a
positive recommendation to me.  If anything, it's negative.

CVS's approach does give you fast access to recent revisions, but
because there are no intervening snapshots, old revisions are more
expensive to retrieve. If a recent revision is damaged, you can lose
access to all prior history.  And of course, this approach isn't
append-only, which makes it more prone to corrupting old revisions.

So if Martin and Rob were proposing to emulate CVS *that* closely, I'd
be quite concerned.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGG5E/0F+nu1YWqI0RAr5lAJ9oHfUXyyUSPDJsCrIVW2ONYtpthACfTutD
QQuJad7+3Y1dOvY52WeKcKo=
=TljN
-----END PGP SIGNATURE-----



More information about the bazaar mailing list