[rfc] bencode unicode strings

Andrew Bennetts andrew.bennetts at canonical.com
Tue Jun 16 09:25:12 BST 2009


Alexander Belchenko wrote:
> For QBzr needs I need to implement support for bencoding unicode  
> strings. Standard bencode uses strings in the stream as byte streams.
>
> Because Qt internally works with pure unicode (not utf-8 as gtk) then I  
> have to encode strings to utf-8 manually. I want bencode to handle this  
> for me.
>
> I'm not sure how it handled in revision serializer and does it makes  
> sense to have such support in the core?

The bencode format only has the concept of byte-strings, not unicode.  So
currently you need to explicitly encode (and decode).  Because bzr internally
tends to use utf-8 a fair bit this hasn't been a big burden so far... I'm
curious to know why explicitly encoding and decoding is so much more of an issue
for QBzr?  Perhaps naïvely I would have expected that data like commit messages
and committer names would already be encoded/decoded for you by bzrlib.  Why is
QBzr touching revision serialisation directly?

-Andrew.




More information about the bazaar mailing list