[MERGE][RFC] Add simple revision serializer based on RIO.
Jelmer Vernooij
jelmer at samba.org
Sat May 2 13:38:40 BST 2009
Aaron Bentley wrote:
> bb:comment Where is this tested?
There aren't any tests yet, other than the generic tests that were
already there. I wasn't quite sure whether this code had any chance of
making it in, hence also the [RFC].
> Also, isn't this a massive change to the behaviour of
> CHKSerializerSubtree?
Yeah. When it's accepted it should probably end up in whatever new
development format is more appropriate.
> Jelmer Vernooij wrote:
> > === modified file 'bzrlib/chk_serializer.py'
> > --- bzrlib/chk_serializer.py 2009-04-09 20:23:07 +0000
> > +++ bzrlib/chk_serializer.py 2009-04-15 01:37:51 +0000
> > @@ -16,14 +16,94 @@
>
> > """Serializer object for CHK based inventory storage."""
>
> > +from cStringIO import (
> > + StringIO,
> > + )
> > +
> > from bzrlib import (
> > + cache_utf8,
> > inventory,
> > + osutils,
> > + revision as _mod_revision,
> > + rio,
> > xml5,
> > xml6,
> > )
>
> > -
> > -class CHKSerializerSubtree(xml6.Serializer_v6):
> > +class RIORevisionSerializer1(object):
> > + """Simple line-based revision serializer.
> > +
> > + This writes simple key/value pairs for all elements on the
> revision,
> > + each on a separate line and using colons to separate key and
> value. Newlines
> > + are backslash-escaped where necessary.
>
> I think this docstring describes your original implementation, not RIO.
> RIO does not backslash-escape newlines.
Thanks, fixed.
>
> > +
> > + The commit message is always serialized last and can span
> multiple lines.
> > + """
>
> Although this is accurate, I don't see a reason to guarantee it by
> enshrining it in a docstring.
This was the same issue as above; it mattered for the original
implementation as that just used the tail of the revision string for the
commit message, without further parsing it.
> > +
> > + def write_revision(self, rev, f):
> > + decode_utf8 = cache_utf8.decode
> > + w = rio.RioWriter(f)
> > + s = rio.Stanza()
>
> This method would be a lot shorter if you used the kwargs argument of
> the Stanza constructor, e.g.
>
> s = rio.Stanza(revision_id=rev.revision_id, timestamp=rev.timestamp,
> inventory_sha1=rev.inventory_sha1, ...
The reason for doing it this way was, as John points out, an attempt to
make sure things that will often stay the same are grouped together for
better compression.
> > + revision_id = rev.revision_id
> > + if isinstance(revision_id, str):
> > + revision_id = decode_utf8(revision_id)
>
> Shouldn't we define write_revision as either requiring unicode or utf-8
> throughout?
I figured unicode revision ids were still allowed, but none of the tests
seem to actually use them. Fixed now to always require utf8 strings.
> > + s.add("revision-id", revision_id)
> > + s.add("timestamp", "%.3f" % rev.timestamp)
> > + if rev.parent_ids:
> > + s.add("parent-ids",
> > + " ".join([decode_utf8(p) for p in rev.parent_ids]))
>
> RIO is designed to allow the same key to be used multiple times, so I
> think it's more natural to handle it as follows:
>
> for parent_id in rev.parent_ids:
> s.add("parent-id", parent_id)
That makes sense. Fixed.
Cheers,
Jelmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rio-serializer.diff
Type: text/x-patch
Size: 10640 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090502/b2009715/attachment-0001.bin
More information about the bazaar
mailing list