[MERGE][RFC] Add simple revision serializer based on RIO.

Sat May 2 13:38:40 BST 2009

Aaron Bentley wrote:
> bb:comment Where is this tested?
There aren't any tests yet, other than the generic tests that were
already there. I wasn't quite sure whether this code had any chance of
making it in, hence also the [RFC].
> Also, isn't this a massive change to the behaviour of
> CHKSerializerSubtree?
Yeah. When it's accepted it should probably end up in whatever new
development format is more appropriate.
> Jelmer Vernooij wrote:
> > === modified file 'bzrlib/chk_serializer.py'
> > --- bzrlib/chk_serializer.py    2009-04-09 20:23:07 +0000
> > +++ bzrlib/chk_serializer.py    2009-04-15 01:37:51 +0000
> > @@ -16,14 +16,94 @@
>
> >  """Serializer object for CHK based inventory storage."""
>
> > +from cStringIO import (
> > +    StringIO,
> > +    )
> > +
> >  from bzrlib import (
> > +    cache_utf8,
> >      inventory,
> > +    osutils,
> > +    revision as _mod_revision,
> > +    rio,
> >      xml5,
> >      xml6,
> >      )
>
> > -
> > -class CHKSerializerSubtree(xml6.Serializer_v6):
> > +class RIORevisionSerializer1(object):
> > +    """Simple line-based revision serializer.
> > +    
> > +    This writes simple key/value pairs for all elements on the
> revision,
> > +    each on a separate line and using colons to separate key and
> value. Newlines
> > +    are backslash-escaped where necessary.
>
> I think this docstring describes your original implementation, not RIO.
>  RIO does not backslash-escape newlines.
Thanks, fixed.
>
> > +
> > +    The commit message is always serialized last and can span
> multiple lines.
> > +    """
>
> Although this is accurate, I don't see a reason to guarantee it by
> enshrining it in a docstring.
This was the same issue as above; it mattered for the original
implementation as that just used the tail of the revision string for the
commit message, without further parsing it.
> > +
> > +    def write_revision(self, rev, f):
> > +        decode_utf8 = cache_utf8.decode
> > +        w = rio.RioWriter(f)
> > +        s = rio.Stanza()
>
> This method would be a lot shorter if you used the kwargs argument of
> the Stanza constructor, e.g.
>
> s = rio.Stanza(revision_id=rev.revision_id, timestamp=rev.timestamp,
>                inventory_sha1=rev.inventory_sha1, ...
The reason for doing it this way was, as John points out, an attempt to
make sure things that will often stay the same are grouped together for
better compression.
> > +        revision_id = rev.revision_id
> > +        if isinstance(revision_id, str):
> > +            revision_id = decode_utf8(revision_id)
>
> Shouldn't we define write_revision as either requiring unicode or utf-8
> throughout?
I figured unicode revision ids were still allowed, but none of the tests
seem to actually use them. Fixed now to always require utf8 strings.
> > +        s.add("revision-id", revision_id)
> > +        s.add("timestamp", "%.3f" % rev.timestamp)
> > +        if rev.parent_ids:
> > +            s.add("parent-ids",
> > +                " ".join([decode_utf8(p) for p in rev.parent_ids]))
>
> RIO is designed to allow the same key to be used multiple times, so I
> think it's more natural to handle it as follows:
>
> for parent_id in rev.parent_ids:
>     s.add("parent-id", parent_id)
That makes sense. Fixed.

Cheers,

Jelmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rio-serializer.diff
Type: text/x-patch
Size: 10640 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090502/b2009715/attachment-0001.bin