Unicode through filesystem tricks

John Arbash Meinel john at arbash-meinel.com
Mon Jan 16 01:18:00 GMT 2006


Erik Bågfors wrote:
> 2006/1/15, Robert Collins <robertc at robertcollins.net>:
> 
>>On Fri, 2006-01-13 at 01:03 -0600, John A Meinel wrote:
>>
>>
>>>$ ls r*
>>>räksmörgås  räksmörgås
>>>
>>>(well, it looks correct on my terminal, that both files have the exact
>>>same name, in the same directory)
>>
>>Interestingly to me, the first raksmogas has just-visible the diaeresis
>>whereas the second one its very visible. Guess thats the difference
>>between combining and non combining representations.
> 
> 
> The same for me.
> 
> Do I understand this correctly, but the first one is just an a, an o
> and an a, and then there is a combining representation that puts dots
> on top of them.  And in the second, it's acctually a different
> character (The characters å, ä and ö).
> 
> If so, I think the first representation is wrong.  These three
> characters are really three characters and nothing else.
> 
> Regards,
> Erik

Welcome to Unicode. Which allows you to specify it in either way. The
first format is NFD normalized, and the second is NFC normalized. Now
for XML documents, they are generally supposed to be NFC normalized. And
the characters translate better into iso-8859-*.

If it wasn't for the fact that Mac uses NFD, I don't think I would have
ever come across this.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060115/258e932c/attachment.pgp 


More information about the bazaar mailing list