[MERGE REVIEW] Revert destroys file contents produced by merge

Martin Pool mbp at sourcefrog.net
Mon Feb 27 18:56:04 GMT 2006


On 26 Feb 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> Martin Pool wrote:
> Right.  I should have said 'replaced' rather than 'deleted'.  Living too
> close to the code, I guess.

Right, so at one level deleted -- i just wanted to check what the
user-visible behaviour would be.

> | This looks pretty good; to me it seems to provide a "list of strings"
> | companion to the "list of multimaps" in rio.
> 
> Isn't rio a multimap, not a list of them?

Each stanza is a multimap, but you can have a sequence of them in a file
separated by blank lines.  The idea is that for an inventory you would
have one stanza per entry, each of which containing a set of fields
similar to the attributes within an xml node.

> | I think both have a place
> | - splatfile more so when we don't expect new fields to be added within a
> | particular format or to need repeated fields.  Do you agree?
> 
> I guess I do see them more as competitors.  I think they represent
> different design decisions about where flexibility is needed.  I'm not
> tied to splatfile; we could make changes to rio instead, if you'd prefer
> that.

Well, if we talk about the differences perhaps we can work out something
that works everywhere.

> I should explain that I'm using splatfile to represent a dict in this
> case, which is clearly a subset of what rio can represent.  But the keys
> of my dict are illegal values in rio.  I could encode them as hex, but
> I'd prefer to be able to read the output files with the naked eye.

Right, being able to at least read them is pretty desirable.

I had originally thought of the rio field names as being similar to
Python attribute names or to XML attribute names, both of which have to
be ascii subject to some constraints.  That's just kind of design by
analogy so there's no strong reason to keep it if having unicode there
would be more useful.  We can escape it in the same way.  There's no
sense having callers escaping the contents too.

For human readability and so that you can add new optional fields in the
future I wanted to have fields explicitly labelled, at least briefly.
In some cases like the ancestry cache or the stat cache perhaps the
labels are not worth the time or space, but I suppose you can always
just make them small.  Ordered multimaps are perhaps less natural in
Python than regular dicts but it seems like a natural way to store
repeated attributes such as parent ids.

I wouldn't want callers needing to manually escape things.

I think allowing multiline values to span lines makes it much more
readable for humans than escaping them.  For the field names it might be
best to just escape special characters - or we could allow unicode but
not newlines, colon etc.  Perhaps that would be too much of a restriction
since they can turn up in file ids.

We could also consider using a length-delimited format, which is
possibly quite a bit faster to parse.

If we changed this would rio be suitable?

> I would rather have the physical format be explicit.  Much like XML has
> both an XML format version and a schema.  Not because I want to be able
> to vary both independently, but to ensure the splatfile parser can't
> return garbage.  Perhaps two headers, or a combined header, then?
> 
> Two headers might be:
> 
> BZR merge-modified list format 1
> BZR Splatfile format 1
> 
> Combined might be:
> BZR merge-modified list format 1 / BZR Splatfile format 1

That makes sense - or maybe the primitive format first since it's
conceptually checked first.

-- 
Martin




More information about the bazaar mailing list