[rfc] bencode unicode strings
Alexander Belchenko
bialix at ukr.net
Tue Jun 16 10:58:08 BST 2009
Andrew Bennetts пишет:
> Alexander Belchenko wrote:
>> Andrew Bennetts пишет:
>>> Alexander Belchenko wrote:
>>>> For QBzr needs I need to implement support for bencoding unicode
>>>> strings. Standard bencode uses strings in the stream as byte streams.
>>>>
>>>> Because Qt internally works with pure unicode (not utf-8 as gtk) then
>>>> I have to encode strings to utf-8 manually. I want bencode to handle
>>>> this for me.
>>>>
>>>> I'm not sure how it handled in revision serializer and does it makes
>>>> sense to have such support in the core?
>>> The bencode format only has the concept of byte-strings, not unicode. So
>>> currently you need to explicitly encode (and decode).
>> Am I not write exactly the same in my first mail?
>
> Close. I wanted to emphasise that adding what you want to bencode would
> mean an incompatible change to the bencode format.
Not really.
Python bencode implementation is highly modular, so I can subclass
Decoder and extend it to handle unicode strings. And then create
additional function, say bdecodeu.
Similarly, I can extend encoder and to teach it handle unicode.
And provide new function bencodeu.
And everything will stay backward compatible, do not?
> [...]
>> See my explanations above: I have unicode strings everywhere in Qt and I
>> need to store some of these strings in config files (e.g. qbzr.conf,
>> branch.conf or tree.conf). Configobj requires me to provide an unicode
>> strings to store, so I have to do double conversion here:
>>
>> 1) Convert my unicode strings to utf-8
>> 2) Bencode them
>> 3) Convert bencoded result from utf-8 to unicode
>> 4) Store them in conf file via ConfigObj
>>
>> And the things are much worse when I need to bencode dicts or lists.
>> Something is wrong here, ne's pas?
>>
>> I hope my intent is more clear now.
>
> Oh, I see. So this is for the inter-process communication that QBzr does?
> And/or for QBzr-specific values you are storing in configuration files? Or
> something else?
For storing data in config files. E.g. save commit messages & co, or to
implement "Recent used" support for branch URLs, etc.
I found bencode serializer is very handy in its simplicity for such things.
More information about the bazaar
mailing list