cvsps-import plugin on Cygwin [2]

John Arbash Meinel john at arbash-meinel.com
Thu Jan 11 16:20:59 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> Also cvsps-import (on Cygwin) don't like russian characters in filenames.
> It fails with big and ugly traceback ends with:
> 
> ...
>   File "\Bazaar\plugins.work/cvsps-import/cvsps/importer.py", line 550, in get_t
> ext
>     cvs_file = self._get_cvs_filename(filename)
>   File "\Bazaar\plugins.work/cvsps-import/cvsps/importer.py", line 511, in _get_
> cvs_filename
>     filename + ',v')
>   File "/tmp/python.340/usr/lib/python2.4/posixpath.py", line 65, in join
>     path += '/' + b
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 1: ordinal
> not in range(128)
> 
> 
> IIRC cygwin don't like non-ascii characters but MC itself is able
> to show russian filenames correctly.
> 
> Alexander
> 

cygwin (and cygwin python) operates in ASCII mode, and I haven't found a
way to avoid that.

There shouldn't be a specific reason why you couldn't use win32 bzr with
win32 python controlling a cygwin cvsps program, though.

Now, I don't know how cvsps handles filenames, so we would have to look
into that.

The final point is that probably the filename encoding isn't going to be
the same as the Author and Log Message encoding. I'm not positive about
that, but that is my initial guess.

You would need to play with line 170 in cvsps/parser.py:

        fname, version = line[1:].rsplit(':', 1)
        versions = version.split('->')
        assert len(versions) == 2
        self._patchset.members.append((fname, versions[-1].strip()))

Basically you should be able to do something like:

fname = fname.decode(self._encoding)

That might at least get you started. You'll have to tell me what
encoding the filenames are in, though.

Oh, and if you are interested, we could save a little bit of the parsing
startup time by having a decoder object, like we switched to in bzrlib.
Specifically in Parser.__init__ just use:


  self._encoding = encoding
  self._decoder = codecs.getencoder(encoding)

You may need to 'import codecs' at the top, and then where you see
"foo.decode(self._encoding)" just change that to
"self._decoder(foo)[0]". This is probably not a huge overhead at the
moment, but it is something I just realized while looking into it.

I'm not sure what you mean by "MC", is that the name of a CVS front end?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFpmPrJdeBCYSNAAMRApgyAKDN9BEXz5thTRHGLLL0UUMSehY3KQCgsCth
U9wPRWf0UnrlVIcUDInTKTs=
=Lzrg
-----END PGP SIGNATURE-----



More information about the bazaar mailing list