Other uses for tree versioning systems

Thu Jun 9 21:09:12 BST 2005

Aaron Bentley wrote:
> John A Meinel wrote:
> | Aaron Bentley wrote:
> |>Personally, I think it's usually a bad idea to have your control files
> |>be revision-controlled.
> |
> | Well, you need to have some way of getting back the metadata for
> | "revision-612". You certainly could store the meta-data into the
> | inventory XML, but that is a little bit more invasive.
> 
> My sense is that the extra metadata should be treated as data, not
> files.  I'm sure there's a way we can provide for extra data to be
> inserted in revisions without distorting things too seriously.
> 
> Or maybe it's versioned, but I think that will just lead to awkwardness.
> 

What about if each <inventory><entry> tag has an optional child named
<meta>.
So that each InventoryEntry object has a meta property. This could
probably be a Meta object. I would generally say that bzr shouldn't care
what is in it, as long as it reproduces it accurately between revisions.
(Meaning Meta could just contain the text string, or a cElementTree
object, or whatever, just as long as it got properly serialized with the
to_element/from_element commands).

Plugins/other items are allowed to interpret that property as
appropriate, since bzr just passes it along.

> | I'm certainly not looking to have merge/etc work on the .bzrpermissions
> | file. I just want to have a version of the meta-data for each revision.
> 
> Sure.  I think this is one of the things Arch gets wrong.  Because its
> metadata is versioned, the metadata has to be formatted in very specific
> and unfortunate ways.  And even then, there are still issues when you
> merge it.

Sure. I can imaging that having diff always include 3 lines of local
info isn't very nice. All you really want is 1 entry / file, and and
remove as you like.

> 
> Better to have revisions contain both files and data, where data is
> updated according to its meaning.  Of course, the line between updating
> data and merging files is a fuzzy one.
> 
> |>The merge/changeset application code is already highly pluggable.  You
> |>just need a MyThreeWayPermissionMerge class that has an apply()
> |>function.  You stick that in the ChangesetEntry.metadata_change fields
> |>of affected entries.
> |>
> |
> |
> | That seems pretty nice. But when do you "stick that in"? What actually
> | builds up the changeset?
> 
> Technically, merge_core.make_merge_changeset().  This takes as its input
> a changeset produced by merge.generate_cset_optimized(), which in turn
> calls changeset.generate_changeset().
> 
> | Also, it seems that metadata_change should really be a list, since you
> | might have multiple plugins supplying different metadata. Though I guess
> | you have the "ApplySequence" handler, which might be just for that.
> 
> Yes, I thought it was useful to separate out that rarely-used
> responsibility to another object.  For instance, you can stick the
> ApplySequence inside a ReverseApply to invert the meaning.
> 
> | However, I think you have a bug, in that on line 1275 you have:
> |  return ApplySequence(old_meta, new_meta)
> | But your ApplySequence.__init__ function only takes 1 parameter, not 2.
> 
> Yes, that looks like a bug.  It's part of the changeset composition
> code, though, which will probably never be used with bzr.
> 
> | I'm also looking at the code, and I see some odd things.
> 
> Yes, there are odd things.  This code comes from BaZing, which was my
> aborted attempt to implement Martin's ideas by myself.  The fusion of
> bzr and BaZing is like one of those genetically engineered goats that
> produce spider silk in their milk.
> 
> | What I find weird is that
> | if the id points to a directory you return "MergeTree.tempdir".
> 
> The code was written on the assumption that I would have something like
> an Arch revision library to work with.  Then, it turned out there was no
> revision library.  Since bzr doesn't pay attention to metadata, just
> file type, I needed to return a directory, but any directory would do.
> 
> | Then in ChangesetGenerator.make_entry() you do a stat on both of the
> | returned paths, and use that to determine if there is a meta-data
> | change. But since you created the file without any concern to metadata
> | (in readonly_path) the metadata change is always going to be bogus.
> 
> Since bzr doesn't handle metadata at all, the bogus metadata changes are
> stripped out in generate_cset_optimized.

So how does a plugin hook into generate_cset_optimized?

> 
> | I'm thinking that we probably need a better way to generate the texts.
> | But I'm thinking that probably we shouldn't be dealing with paths, since
> | many times one side is going to be inside the revision library, and we
> | would rather not extract the full text of every file and put it
> somewhere.
> 
> We do have to deal with paths, because paths are used to invoke diff,
> diff3, etc.  In generate_cset_optimized, we disable extraction of every
> unchanged file, so we're not extracting ludicrous numbers of files.  (As
> long as you append /@, that is.)

Are we actually invoking this, or just using difflib? (Does it invoke
them?). What if we provide the text of the data, rather than a file to
work on?

I know you can provide one file from stdin, but there is only 1 stdin,
and you need 2 file handles. So probably we need at least one file in
the filesystem (and for diff3 I guess we need at least 2). It just seems
odd that in places you are writing the file out to the filesystem just
to read it in again, and pass it to diff (bzrlib/changeset.py line 1447
.read()s the file that was just created).

> 
> But I agree with you-- this code should become better-integrated with
> bzr than it currently is.  Statting the data should be done using the
> MergeTree so that we can use the inventory metadata instead.
> 

Well, from what I can tell, checking the mode should be completely
removed from the code, to be replaced with some sort of meta-data hook.
I'm not sure where that hook should be defined. It would be nice if you
could pass it to the code, but it seems like it would probably have to
travel pretty far from a front-end/plugin down into the actual changeset
code.

> But for now, I'm working on
> 1. improving the robustness and correctness
> 2. getting it under test
> 3. using the stat cache?
> 

Seems a better place to focus, but since I don't know that part very
well, I'll play around with meta-data.

> | Anyway, I'm trying to wrap my head around your changeset & merge code.
> 
> Sorry it's not cleaner and clearer than it is.
> 

At least it exists. :)

> Aaron

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050609/e51dcd85/attachment.pgp