[MERGE REVIEW] Revert destroys file contents produced by merge

Mon Feb 27 21:56:58 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> Each stanza is a multimap, but you can have a sequence of them in a file
> separated by blank lines.  The idea is that for an inventory you would
> have one stanza per entry, each of which containing a set of fields
> similar to the attributes within an xml node.

Okay, so this is also similar to XML in that it doesn't guarantee a
particular number of lines per value.

That is one of the reasons why I think Boroncelli's hack for detecting
revision-ids by reading the weaved XML is a dirty hack.  I would be much
more comfortable with it if the format been designed so that it could be
read directly from weave.  I think splatfile is better-suited to this
than rio, because its dictionary form is line-per-key/value pair.  In
fact, I think you could build an API that supported annotated values,
and then be able to history-based scalar merges.

You might represent inventory entries like this (the first line is to
assert that there is an entry whose revision-id is bzqr):

bzqr.file_id bzqr
bzqr.type file
bzqr.name README
bzqr.text_sha1 24352edfsdg34
bzqr.text_size 3
bzqr.revision rev at lambdafoo

Hmm.  Perhaps I should reserve a path separator, like '.' or '/'.

> I had originally thought of the rio field names as being similar to
> Python attribute names or to XML attribute names, both of which have to
> be ascii subject to some constraints.  That's just kind of design by
> analogy so there's no strong reason to keep it if having unicode there
> would be more useful.  We can escape it in the same way.  There's no
> sense having callers escaping the contents too.

I think this is definitely true of the values.  You've managed to plant
a seed of doubt about keys, though.

> For human readability and so that you can add new optional fields in the
> future I wanted to have fields explicitly labelled, at least briefly.
> In some cases like the ancestry cache or the stat cache perhaps the
> labels are not worth the time or space, but I suppose you can always
> just make them small.  Ordered multimaps are perhaps less natural in
> Python than regular dicts but it seems like a natural way to store
> repeated attributes such as parent ids.

I suppose I could have written it as a multimap of {"revision-id":
"foo", "parent-id": "bar", "parent-id": "baz"}.  It just seemed natural
to write it as {"foo": "bar", "foo":"baz"}, at the time.

> I think allowing multiline values to span lines makes it much more
> readable for humans than escaping them.

But you're not wrapping at any particular width, are you?  If so, you
may have an argument, but most file viewers will auto-wrap.

>  For the field names it might be
> best to just escape special characters - or we could allow unicode but
> not newlines, colon etc.  Perhaps that would be too much of a restriction
> since they can turn up in file ids.

If we want to use file or revision ids as tags at all, we should support
any of the characters they may contain.  It's nice to be able to parse
the file straight into a dict; using 'file-id' as a tag means that we
need to externally specify which tag value to use as a dictionary key.

> We could also consider using a length-delimited format, which is
> possibly quite a bit faster to parse.

Hmm.  I don't have much experience with those, but I guess it could be good.

> If we changed this would rio be suitable?

I think I'd be happy with rio if it had arbitrary keys, but I'm willing
to reconsider my desire for arbitrary keys.

>>Two headers might be:
>>
>>BZR merge-modified list format 1
>>BZR Splatfile format 1
>>
>>Combined might be:
>>BZR merge-modified list format 1 / BZR Splatfile format 1
> 
> 
> That makes sense - or maybe the primitive format first since it's
> conceptually checked first.

Oh, I was thinking that read_merge_modified would read the
merge-modified header, then pass the file object on to the splatfile
parser.  But this can be done any number of ways.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEA3Wq0F+nu1YWqI0RAkhJAJ9EiEkM6ng/JuyQ9hCCOPaPEyut3ACeILH+
H8aKty1TCfZya+GUBis9gao=
=GxRl
-----END PGP SIGNATURE-----