[MERGE][RFC] Add simple revision serializer based on RIO.

Mon May 11 15:17:25 BST 2009

John Arbash Meinel пишет:
> Martin Pool wrote:
>> 2009/5/11 Matt Nordhoff <mnordhoff at mattnordhoff.com>:
>>> Martin Pool wrote:
>>> [snip]
>>>
>>>> However, before moving to RIO for future formats (and I say this
>>>> having added the code) I would think hard about whether it should use
>>>> bencode instead, which has the advantage of being able to represent
>>>> somewhat more complex nesting (like dicts inside dicts) without
>>>> needing a separate layer of encoding on top.  Revisions are pretty
>>>> simple but even there it may be useful.  I'm not sure about the
>>>> relative performance.
>>> If I understand correctly, RIO is line-based but bencode is not. Is the
>>> delta format still line-based? If so, using bencode would be more difficult.
>> We don't do line-by-line compression on revisions because generally
>> speaking there's not much in common between them.  zlib compression in
>> groupcompress will pick out common strings like committer names.  Good
>> question though.
>>
> 
> We do "line-by-line" delta compression in --dev6 because I found that
> this assumption was incorrect. We get 2:1 compression improvements by
> doing delta compression. Since the only difference for 'revisions'
> fields is the fact that --dev6 puts them in groups w/ delta compression.

I guess there is possible to extend bencode format to accept \n 
character between fields. So bzr can use benefits of line-by-line 
compression.

E.g. encoding dict as

d\n
9:committer10:John Smith\n
e

or something similar will do the trick?

Just a crazy idea if bencode is really fast choice.