CVS migration help
John Arbash Meinel
john at arbash-meinel.com
Tue Oct 7 18:40:39 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Thomas Manson wrote:
> Hi Brian,
>
> on my new system :
>
> LANG=en_US.UTF-8
>
> thomas at home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas at home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ll
> total 32
> drwxr-xr-x 2 thomas thomas 4096 2008-10-06 18:07 .
> drwxr-xr-x 9 thomas thomas 4096 2008-10-06 18:07 ..
> -r--r--r-- 1 thomas thomas 23274 2008-01-20 00:56 Sp?cifications.doc,v
> thomas at home:~/temp/cvsrepo/crf-irp/Ressources/documentation$
> <mailto:thomas at home:~/temp/cvsrepo/crf-irp/Ressources/documentation$> ls
> -N | hexdump -C
> 00000000 53 70 e9 63 69 66 69 63 61 74 69 6f 6e 73 2e 64
> |Sp.cifications.d|
> 00000010 6f 63 2c 76 0a |oc,v.|
> 00000015
>
> On my old system, from which the files came from :
>
> LANG=fr_FR at euro <mailto:LANG=fr_FR at euro>
>
^- The fact that it is a single character means that it *is not* in
UTF-8, it would take 2 characters to encode é.
Now:
>>> print '\xe9'.decode('latin1')
é
>>> '\xe9'.decode('latin1').encode('utf-8')
'\xc3\xa9'
Anyway, *most* current filesystems would assume that paths are in UTF-8
(Linux doesn't actually specify, everything is just a NULL terminated
string), which causes problems because we have to "guess" what things
really are.
In this case, your filename is probably in Latin-1 encoding.
This is partially why cvsps-import doesn't support it, because we don't
really know what encoding to use for filenames. (Mostly because nobody
had non-ascii filenames and wanted us to make it work.)
For example, code like this *could* do what you want:
=== modified file 'cvsps/parser.py'
- --- cvsps/parser.py 2007-02-08 22:33:44 +0000
+++ cvsps/parser.py 2008-10-07 17:39:30 +0000
@@ -174,6 +174,7 @@
if ':' not in line:
return
fname, version = line[1:].rsplit(':', 1)
+ fname = fname.decode(self._encoding)
fname = self._cache(fname)
versions = version.split('->')
assert len(versions) == 2
It just uses the same encoding for filenames that we use for the log
content and the committer names.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjrnxcACgkQJdeBCYSNAAPWhwCgy/4VbBRxWIcb0JzJxz1xURW+
MuUAoKqtfapED0UniQd7vn4Nv6fAEFOt
=w//u
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list