Warping minds with the phrase "changeset"

Aaron Bentley aaron.bentley at utoronto.ca
Mon Jan 30 14:36:27 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 29 Jan 2006, James Blackwell <jblack at merconline.com> wrote:
> One of them is, as you say, that a changeset is a description of changes
> to a whole tree, i.e. a set of patches, and that you can identify the
> changeset as a whole.  We have these, as do most other modern systems,
> and at least some use the term "changeset", at least informally.

What you are saying is true, from a certain point of view.  But claiming
we have changesets is the kind of thing Obi-wan Kenobi would say, and
most of the time, I think it's simpler to just say, "Dath Vader's your
dad, kid."

Yes, as long as we have two revisions in our repository, we can infer a
changeset between them.  But it is the revision snapshots that we store,
not the revision deltas.  By definition, snapshot-based storage can be
transformed into changeset-based storage and vice-versa, but these
approaches have different strengths and weaknesses.

Snapshot-based storage is less fragile than changeset-based storage,
because revisions are independent.  This makes it cheap to validate any
given revision.  Because changesets each depend on the previous
changeset for their meaning, a pure changeset-based system must have no
ghosts and validation must work from the beginning of history onward.
The Gnu Arch development line, for example, has many inconsistencies;
file permissions are changed from A to B, then from C to D, without ever
being changed from B to C.  Some of its changesets are internally
inconsistent, also.  It is hard for a snapshot to be internally
inconsistent.

On the other hand, changeset-based storage does allow for more
flexibility in the type of changes stored.  Darcs token-replace patches
are a good example of the kind of thing Bazaar-NG cannot do easily.

James is portraying Bazaar-NG as a system with changeset-based storage,
and I think that description harms understanding.  A person who hears
that may ask me whether we have token-replace changesets.  To which I'll
reply "No, Bazaar-NG doesn't store changesets".  And then I'll be making
our community guy look like a liar.  Another person may decide that they
don't want to use a changeset-oriented system, because it's too fragile.
 And people who try to understand the code based on James' explanation
will have a very hard time.

Let's look at an example:

> The rename problem is solved by keeping each commit together in
> something called a "changeset". Since changes are now kept together in
> a changeset, other things can be kept as well. The RCS can even record
> that a file was renamed or deleted.

This does not resemble the way we handle renames.  The truth of the
matter is that files have ids.  A curious user can list them with 'bzr
inventory --show-ids'.  In each commit, we record the id, name and
parent directory of each file.  We certainly can *infer* renames, but
that's not what we store.

This is a good thing, because file-ids made file identity very easy to
establish.  Systems like Monotone store renames rather than file-ids.
But in order to do tree-wide merges, they still need to establish file
identity.  In order to do this, they must trace the rename history of
every file back to the base revision.  So file-ids make merging more
efficient.

They're also more flexible.  In Monotone, if the file in THIS did not
exist in the base revision, it's not considered the same as a file in
OTHER with the same name and identical contents.  With file-ids, it is
possible for THIS and OTHER to introduce the "same" file.

So let's play to our strengths.  Yes, there is a perspective from which
we're storing changesets, but that perspective is a mathmatical one, not
a natural one.  And all the other documentation that people encounter
will describe a system that stores data as snapshots.  Let's just keep
it simple.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD3iRr0F+nu1YWqI0RAipdAJ4nCzYMWi15ihVVkERsdILt+dKUiQCfbu4o
i+Uu4UDsHYzpRDHwavfzkhw=
=Rwfn
-----END PGP SIGNATURE-----




More information about the bazaar mailing list