[MERGE] BEncode Revision Serializer

Aaron Bentley aaron at aaronbentley.com
Mon Jun 1 20:42:57 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jelmer Vernooij wrote:
> The attached patch uses bencode for revision serialization. It passes
> the testsuite but is still quite slow, about twice as slow as XML on my
> machine. I hope to put up a patch that uses Alexanders' Pyrex
> implementation next.

bb:tweak

> +class BEncodeRevisionSerializer1(object):
> +    """Simple revision serializer based around bencode. 
> +    
> +    It tries to group entries together that are less likely
> +    to change often, to make it easier to do compression.
> +    """

Is this accurate?  I don't see how it would do this.

> +
> +    def write_revision_to_string(self, rev):
> +        encode_utf8 = cache_utf8.encode
> +        ret = {
> +            "revision-id": rev.revision_id,
> +            "timestamp": "%.3f" % rev.timestamp,
> +            "parent-ids": rev.parent_ids,

I thought revision-ids and parent-ids were stored as unicode in
revisions.  They're not?

> +            "inventory-sha1": rev.inventory_sha1,
> +            "committer": encode_utf8(rev.committer),
> +            "message": encode_utf8(rev.message),
> +            }
> +        revprops = {}
> +        for key, value in rev.properties.iteritems():
> +            revprops[key] = encode_utf8(value)
> +        ret["properties"] = revprops
> +        if rev.timezone is not None:
> +            ret["timezone"] = str(rev.timezone)

Why stringify timezone instead of storing it as an int?

> +_working_revision_bencode1 = 'd9:committer54:Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>14:inventory-sha140:4a2c7fb50e077699242cf6eb16a61779c7b680a77:message35:(Jelmer) Move dpush to InterBranch.10:parent-idsl50:pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc48:jelmer at samba.org-20090510012654-jp9ufxquekaokbeoe10:propertiesd11:branch-nick6:+trunke11:revision-id50:pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz9:timestamp14:1242300770.8448:timezone4:3600e'

Please wrap long lines.

> +class TestBEncodeSerializer1(TestCase):
> +    """Test BEncode serialization"""
> +
> +    def test_unpack_revision(self):
> +        """Test unpacking a revision"""
> +        inp = StringIO()
> +        rev = chk_bencode_serializer.read_revision_from_string(
> +                _working_revision_bencode1)
> +        eq = self.assertEqual
> +        eq(rev.committer,
> +           "Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>")
> +        eq(rev.inventory_sha1,
> +           "4a2c7fb50e077699242cf6eb16a61779c7b680a7")
> +        eq(["pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc",
> +            "jelmer at samba.org-20090510012654-jp9ufxquekaokbeo"],
> +            rev.parent_ids)
> +        eq("(Jelmer) Move dpush to InterBranch.", rev.message)
> +        eq("pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz", 
> +           rev.revision_id)
> +        eq({"branch-nick": u"+trunk"}, rev.properties)
> +        eq(3600, rev.timezone)

Why do you assign assertEqual to eq?  I think it makes it harder to read.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkokLz4ACgkQ0F+nu1YWqI36TQCeO8WnfVAiEKuzXUpwwTstOkuv
PfgAnj3UNLwIwc4iXLq2Ps02LLRGoSWb
=p3T+
-----END PGP SIGNATURE-----



More information about the bazaar mailing list