[MERGE] BEncode Revision Serializer
John Arbash Meinel
john at arbash-meinel.com
Mon Jun 1 21:31:19 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Jelmer Vernooij wrote:
>> The attached patch uses bencode for revision serialization. It passes
>> the testsuite but is still quite slow, about twice as slow as XML on my
>> machine. I hope to put up a patch that uses Alexanders' Pyrex
>> implementation next.
>
> bb:tweak
>
>> +class BEncodeRevisionSerializer1(object):
>> + """Simple revision serializer based around bencode.
>> +
>> + It tries to group entries together that are less likely
>> + to change often, to make it easier to do compression.
>> + """
>
> Is this accurate? I don't see how it would do this.
>
>> +
>> + def write_revision_to_string(self, rev):
>> + encode_utf8 = cache_utf8.encode
>> + ret = {
>> + "revision-id": rev.revision_id,
>> + "timestamp": "%.3f" % rev.timestamp,
>> + "parent-ids": rev.parent_ids,
>
> I thought revision-ids and parent-ids were stored as unicode in
> revisions. They're not?
In memory, revision_ids are meant to be kept as UTF-8 strings. Thus you
can store and restore them directly.
>
>> + "inventory-sha1": rev.inventory_sha1,
>> + "committer": encode_utf8(rev.committer),
>> + "message": encode_utf8(rev.message),
>> + }
>> + revprops = {}
>> + for key, value in rev.properties.iteritems():
>> + revprops[key] = encode_utf8(value)
>> + ret["properties"] = revprops
>> + if rev.timezone is not None:
>> + ret["timezone"] = str(rev.timezone)
>
> Why stringify timezone instead of storing it as an int?
>
>> +_working_revision_bencode1 = 'd9:committer54:Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>14:inventory-sha140:4a2c7fb50e077699242cf6eb16a61779c7b680a77:message35:(Jelmer) Move dpush to InterBranch.10:parent-idsl50:pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc48:jelmer at samba.org-20090510012654-jp9ufxquekaokbeoe10:propertiesd11:branch-nick6:+trunke11:revision-id50:pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz9:timestamp14:1242300770.8448:timezone4:3600e'
>
> Please wrap long lines.
Just to mention doing:
_working_revision_bencode1 = ('d9:...'
'more information...'
'finished')
Should work just fine, and certainly is nicer to read. I would recommend
breaking at reasonable item boundaries.
>
>> +class TestBEncodeSerializer1(TestCase):
>> + """Test BEncode serialization"""
>> +
>> + def test_unpack_revision(self):
>> + """Test unpacking a revision"""
>> + inp = StringIO()
>> + rev = chk_bencode_serializer.read_revision_from_string(
>> + _working_revision_bencode1)
>> + eq = self.assertEqual
>> + eq(rev.committer,
>> + "Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>")
>> + eq(rev.inventory_sha1,
>> + "4a2c7fb50e077699242cf6eb16a61779c7b680a7")
>> + eq(["pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc",
>> + "jelmer at samba.org-20090510012654-jp9ufxquekaokbeo"],
>> + rev.parent_ids)
>> + eq("(Jelmer) Move dpush to InterBranch.", rev.message)
>> + eq("pqm at pqm.ubuntu.com-20090514113250-jntkkpminfn3e0tz",
>> + rev.revision_id)
>> + eq({"branch-nick": u"+trunk"}, rev.properties)
>> + eq(3600, rev.timezone)
>
> Why do you assign assertEqual to eq? I think it makes it harder to read.
>
> Aaron
I believe he was copying code that I wrote for XML checks a *long* time
ago, where I did exactly this. Mostly because "self.assertEqual" was
long to write, and takes up a large portion of space for 80 byte wide
lines when you have entries like
"pqm at pqm.ubuntu.com-20090514104039-kggemn7lrretzpvc"
Anyway, I would like to postpone landing this until I've gotten some
numbers for comparing it to the RIO based serializer, and our existing
XML based one. (Size/speed/etc).
I'll try to get that done today.
BB:comment
(I'm not really sure that we have a BB:wait until I give a full review :)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkokOpcACgkQJdeBCYSNAAPw3gCfQPO6lTLgn1R/f+8E/WDXgHw6
QwIAoMzMsl+iG/fs4csPeeHtUnIbcpMI
=QD5a
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list