Introduction to history deltas

Wed Dec 7 01:19:12 GMT 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| On Tue, 2005-12-06 at 08:58 -0500, Aaron Bentley wrote:
|>With bzr.dev, the crossover point is 196 revisions.  Downloading 197
|>revisions with history deltas would be 197 downloads, but with per-file
|>knits, it would be 196.  (That is, if you have revno 1276, and you're
|>updating to the current 1473.)

| One thing that occurs to me is that tracking a heavily merged into
| branch affects the crossover - it will accrue relatively few revisions
| itself, but each of those hauls in 20 or 30 or more revisions. This
| leads to only needing 6 or 7 commits on the 'mainline' before pulling
| from multiple knits is more efficient (in the bzr.dev case).

Oh, drat.  Forgot about merged revisions.  That changes the picture
quite a lot.

By my admittedly back-of-the-envelope calculations, the crossover point
for revno 1473 is 10 revisions away-- i.e. 1463.

$ bzr log -r -10..-1|grep "    merged:" |wc -l
21
(9 + 21=30)
$ bzr status -r -10|grep "  .*"|wc -l
28

| I've got a very bad feeling about grouping the data by revision until
| someone does some hard statistics on the mean & std dev for the
| crossover point for different projects.

Heh, finally a common point between my day job (where we use standard
deviation) and bzr.

| And for remote access - such as
| determining the history of a single file - wheeeoo. That really bites
| for arch as every revision has to be downloaded.

(unless you trust log headers, which of course you can't)

| If our data is grouped
| by revision we have to download every touched revision. if our data is
| grouped by fileid we can pull down just one file.

I think this depends on how data is indexed.  You could certainly have a
file index that enabled you to download only the relevant sections of
the relevant revisions.  But you must be using a latency-beating
strategy for this to work properly.  The converse applies, too-- with a
latency-beating strategy, you can have per-revision indices that enable
you do only download the relevant sections of the relevant files.

In a sense, filesystems are one-dimensional: their only axis is file
path.  Revision control systems are two-dimensional-- they have
"revision" and "pathname" axes, and some operations traverse the
filename axis (e.g. export), while other operations traverse the
revision axis (e.g. annotate).

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD4DBQFDljiP0F+nu1YWqI0RAoBkAJ9gI4z5DIThWf93I9mY5kdISGgoFQCY+eZ0
PKlYatMP0OXfHIFxi6AtNw==
=GhID
-----END PGP SIGNATURE-----