[MERGE][RFC] Add simple revision serializer based on RIO.

John Arbash Meinel john at arbash-meinel.com
Mon Apr 27 22:15:50 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



...

>> +
>> +    def write_revision(self, rev, f):
>> +        decode_utf8 = cache_utf8.decode
>> +        w = rio.RioWriter(f)
>> +        s = rio.Stanza()
> 
> This method would be a lot shorter if you used the kwargs argument of
> the Stanza constructor, e.g.
> 
> s = rio.Stanza(revision_id=rev.revision_id, timestamp=rev.timestamp,
>                inventory_sha1=rev.inventory_sha1, ...
> 

I think he was trying to order the values in the stanza. If you order
them appropriately, you can get pretty decent compression between texts.
(We currently get about 2:1 delta compression in --dev6 for revisions on
top of the 2:1 zlib compression).

Clustering things that always change versus things that often stay the
same helps improve the delta compression. Timestamp and revision_id
being things that change pretty much always, author and timezone being
things that change less frequently (often the same person in the same
timezone will commit several revisions in a row), etc.


> 
>> +        revision_id = rev.revision_id
>> +        if isinstance(revision_id, str):
>> +            revision_id = decode_utf8(revision_id)
> 
> Shouldn't we define write_revision as either requiring unicode or utf-8
> throughout?

So 'revision_id' as an object in memory should generally be a utf-8
string. Same for Revision.revision_id, and InventoryEntry.revision.
Stanza should be working purely in Unicode, though. So my guess is this
should be universally translated. Though it could be that we have tests
that are creating Revision.revision_id directly, or possible not
encoding the result from an xml decode.

> 
>> +        s.add("revision-id", revision_id)
>> +        s.add("timestamp", "%.3f" % rev.timestamp)
>> +        if rev.parent_ids:
>> +            s.add("parent-ids", 
>> +                " ".join([decode_utf8(p) for p in rev.parent_ids]))
> 
> RIO is designed to allow the same key to be used multiple times, so I
> think it's more natural to handle it as follows:
> 
> for parent_id in rev.parent_ids:
>     s.add("parent-id", parent_id)

I think I agree, though I think you likely need to still decode_utf8() here.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkn2IIYACgkQJdeBCYSNAAP+hwCfQBSfZ7kY5R9cEto3mfgWQwNb
5LkAoJQDLbTGzS9M24uQMXTM+Ccv35ov
=XP52
-----END PGP SIGNATURE-----



More information about the bazaar mailing list