newformat format change

John A Meinel john at arbash-meinel.com
Tue Oct 4 15:33:02 BST 2005


Martin Pool wrote:
> On 01/10/05, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
>
>
>>>            print >>f, "RootID:", inv.root.file_id
>>
>>I don't see any character encoding handling, and the escaping appears
>>inconsistent (AFAIK, newlines and tabs are permitted in file_ids).
>
>
> They're probably not trapped at the moment but I think they shouldn't
> be permitted; I don't see a good reason to support them and in some
> contexts (e.g. text changesets) they will cause trouble.
>
> --
> Martin
>
>

One problem with the current trapping (where you use a regular
expression to substitute everything that isn't a word character)
name = re.sub(r'[^\w.]', '', name)

Which I believe will catch newlines and tabs. But it also seems to catch
 too much in the way of international characters.

Back when I was testing with Arabic characters, it was essentially
generating file-ids with just the last portion (no filename part).
Now, maybe you feel that your unique identifier is sufficient (it could be).

I'm just worried a little about collisions in foreign texts where the
entire tree might have no filename in the file_id.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051004/12762503/attachment.pgp 


More information about the bazaar mailing list