[MERGE] BEncode Revision Serializer
Aaron Bentley
aaron at aaronbentley.com
Mon Jun 1 20:42:57 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jelmer Vernooij wrote:
> The attached patch uses bencode for revision serialization. It passes
> the testsuite but is still quite slow, about twice as slow as XML on my
> machine. I hope to put up a patch that uses Alexanders' Pyrex
> implementation next.
bb:tweak
> +class BEncodeRevisionSerializer1(object):
> + """Simple revision serializer based around bencode.
> +
> + It tries to group entries together that are less likely
> + to change often, to make it easier to do compression.
> + """
Is this accurate? I don't see how it would do this.
> +
> + def write_revision_to_string(self, rev):
> + encode_utf8 = cache_utf8.encode
> + ret = {
> + "revision-id": rev.revision_id,
> + "timestamp": "%.3f" % rev.timestamp,
> + "parent-ids": rev.parent_ids,
I thought revision-ids and parent-ids were stored as unicode in
revisions. They're not?
> + "inventory-sha1": rev.inventory_sha1,
> + "committer": encode_utf8(rev.committer),
> + "message": encode_utf8(rev.message),
> + }
> + revprops = {}
> + for key, value in rev.properties.iteritems():
> + revprops[key] = encode_utf8(value)
> + ret["properties"] = revprops
> + if rev.timezone is not None:
> + ret["timezone"] = str(rev.timezone)
Why stringify timezone instead of storing it as an int?
> +_working_revision_bencode1 = 'd9:committer54:Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>14:inventory-sha140:4a2c7fb50e077699242cf6eb16a61779c7b680a77:message35:(Jelmer) Move dpush to InterBranch.10:parent-idsl50:pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc48:jelmer at samba.org-20090510012654-jp9ufxquekaokbeoe10:propertiesd11:branch-nick6:+trunke11:revision-id50:pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz9:timestamp14:1242300770.8448:timezone4:3600e'
Please wrap long lines.
> +class TestBEncodeSerializer1(TestCase):
> + """Test BEncode serialization"""
> +
> + def test_unpack_revision(self):
> + """Test unpacking a revision"""
> + inp = StringIO()
> + rev = chk_bencode_serializer.read_revision_from_string(
> + _working_revision_bencode1)
> + eq = self.assertEqual
> + eq(rev.committer,
> + "Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>")
> + eq(rev.inventory_sha1,
> + "4a2c7fb50e077699242cf6eb16a61779c7b680a7")
> + eq(["pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc",
> + "jelmer at samba.org-20090510012654-jp9ufxquekaokbeo"],
> + rev.parent_ids)
> + eq("(Jelmer) Move dpush to InterBranch.", rev.message)
> + eq("pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz",
> + rev.revision_id)
> + eq({"branch-nick": u"+trunk"}, rev.properties)
> + eq(3600, rev.timezone)
Why do you assign assertEqual to eq? I think it makes it harder to read.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkokLz4ACgkQ0F+nu1YWqI36TQCeO8WnfVAiEKuzXUpwwTstOkuv
PfgAnj3UNLwIwc4iXLq2Ps02LLRGoSWb
=p3T+
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list