Bazaar-NG traffic #2

Jan Hudec bulb at ucw.cz
Tue Oct 11 12:52:11 BST 2005


On Tue, Oct 11, 2005 at 10:23:09 +0200, David Allouche wrote:
> On Tue, 2005-10-11 at 03:21 -0400, James Blackwell wrote:
> > = _Always_ Unicode =
> > 
> > The Unicode discussions continued this week. Last week, Alexander Belchenko
> > referred to some bzr code that didn't handle Russian filenames properly.
> > This week Belchenko followed up a couple more times without a response. 
> 
> Something which has been somewhat nagging me...
> 
> I would like if it were possible to have byte-stream file names. In some
> situations (e.g. automated imports from CVS) you might end up with file
> names contaning non-ASCII characters without any encoding information.
> Trying to interpret those names as unicode is haphazard at best, and
> likely incorrect.

I have recently used non-ascii names (it was with svk though) recently.
I needed them to come right on 3 systems - 1 using iso-8859-2, another
using windows-1250 (though I am not sure the names are not actually
stored in ucs2) and yet another using utf-8. Fortunately subversion does
use utf-8 always, so it worked well.

> Generally, when getting data from legacy sources, you cannot expect to
> have encoding information. I would like to read about how CVS handles
> non-ascii file names, from people who have direct experience with that.

That depends on whether the user starting the import can provide them.
If he does not, it's ucs1 (ie. byte values are code-points).

> To be honest we only once had non-ascii file names in source code
> repositories in a few hundred mainline imports, but the number are
> biased since we have been focusing on increasing the number of
> successful imports, disregarding (numerous) import failures.

It can never fail, if, in the absence of encoding information, such
names are treated as ucs1-encoded. It can produce weird names, which the
user has to fix, either by providing the right encoding during import,
or by renaming the files to proper names afterwards.

--
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051011/effc1ee0/attachment.pgp 


More information about the bazaar mailing list