Introduction to history deltas
Aaron Bentley
aaron.bentley at utoronto.ca
Wed Dec 7 01:19:12 GMT 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
| On Tue, 2005-12-06 at 08:58 -0500, Aaron Bentley wrote:
|>With bzr.dev, the crossover point is 196 revisions. Downloading 197
|>revisions with history deltas would be 197 downloads, but with per-file
|>knits, it would be 196. (That is, if you have revno 1276, and you're
|>updating to the current 1473.)
| One thing that occurs to me is that tracking a heavily merged into
| branch affects the crossover - it will accrue relatively few revisions
| itself, but each of those hauls in 20 or 30 or more revisions. This
| leads to only needing 6 or 7 commits on the 'mainline' before pulling
| from multiple knits is more efficient (in the bzr.dev case).
Oh, drat. Forgot about merged revisions. That changes the picture
quite a lot.
By my admittedly back-of-the-envelope calculations, the crossover point
for revno 1473 is 10 revisions away-- i.e. 1463.
$ bzr log -r -10..-1|grep " merged:" |wc -l
21
(9 + 21=30)
$ bzr status -r -10|grep " .*"|wc -l
28
| I've got a very bad feeling about grouping the data by revision until
| someone does some hard statistics on the mean & std dev for the
| crossover point for different projects.
Heh, finally a common point between my day job (where we use standard
deviation) and bzr.
| And for remote access - such as
| determining the history of a single file - wheeeoo. That really bites
| for arch as every revision has to be downloaded.
(unless you trust log headers, which of course you can't)
| If our data is grouped
| by revision we have to download every touched revision. if our data is
| grouped by fileid we can pull down just one file.
I think this depends on how data is indexed. You could certainly have a
file index that enabled you to download only the relevant sections of
the relevant revisions. But you must be using a latency-beating
strategy for this to work properly. The converse applies, too-- with a
latency-beating strategy, you can have per-revision indices that enable
you do only download the relevant sections of the relevant files.
In a sense, filesystems are one-dimensional: their only axis is file
path. Revision control systems are two-dimensional-- they have
"revision" and "pathname" axes, and some operations traverse the
filename axis (e.g. export), while other operations traverse the
revision axis (e.g. annotate).
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD4DBQFDljiP0F+nu1YWqI0RAoBkAJ9gI4z5DIThWf93I9mY5kdISGgoFQCY+eZ0
PKlYatMP0OXfHIFxi6AtNw==
=GhID
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list