ignore files with invalid filenames

Mon Aug 6 19:07:52 BST 2007

Hi again Martin,

I found that "bzr pull" also have a problem with the existance of
unversioned files with invalid filenames, and I expect it to happen
with many other commands.

I am wondering if its a case of replacing all of the "os.listdir"
with something that already exclude these files, but I think it could
have some performance decrease, as there is a utf8 encoding cache that
would probably lose part or all of its performance gain.

Or if the patches I submitted are going in the right way, so I will wait
for someone to review that patch before trying to continue.

Tks,
Fábio

Martin Pool escreveu:
> So, until someone writes a test, could you please attach your patch to
> the bug about decoding ignored filenames? I'm pretty sure there is a
> bug already. Thanks.
> 
> On 8/2/07, Fábio Machado de Oliveira <absfabio at terra.com.br> wrote:
>>        Martin Pool escreveu:
>>
>> On 7/31/07, Fabio Machado de Oliveira <absfabio at terra.com.br> wrote:
>>
>>
>> I put a try/except for bzr to ignore files that have filenames
>> containing characters invalid to the filesystem encoding, instead of
>> generating a bug report.
>>
>>
>> Hi Fabio,
>>
>> Thanks, that sounds like a really useful fix, I'm very happy to see
>> it.  I do think in general that we want to be skipping over
>> untouchable files rather than failing.  There are probably some other
>> places that need an analogous fix.
>>
>> I have two bits of feedback on this patch:
>>
>> You should be using trace.warn rather than printing directly to stderr.
>>
>> Also, we really need a test for this to make sure that it doesn't
>> regress in future.  To test this we need to be able to create a file
>> with an invalid name.  In some encodings (like iso-8859-1) there are
>> no invalid filenames since any combination of bytes can be
>> interpreted.  In ascii and utf-8 we can do it - I suspect '\xff\xff'
>> will be invalid everywhere.  The best thing is probably to check
>> whether that string can be decoded in osutils._fs_enc.  If it *can*
>> be, skip the test.  Otherwise, create a file and try to version the
>> directory.
>>
>> Now this probably should be a workingtree_implementation test, so that
>> it will be checked on all formats.  However, it will probably fail on
>> some old formats, and it's probably not worth fixing them.  So I'd
>> probably just check the format and if it's one of the old ones, skip
>> the test.
>>
>> This is one of those unfortunate cases where testing something
>> automatically seems much harder than just manually verifying it once.
>> But especially for environment-specific things like encoding it is
>> important that we have one to keep up the level of quality we want.
>>
>>
>>   I dont know how to write a new test to bazaar. I was able to create an
>> invalid file this way:
>>
>>  In VIM, i setted the fileencoding to windows 1252 ( :set
>> fileencoding=windows-1252 ), and wrote an script like "echo test > óóó.txt".
>> The script created an óóó.txt file with an windows-1252 encoding that made
>> bzr status to fail. os.listdir returned this name for it: '\xf3\xe9\xed.txt'
>>
>>  The bzr.dev changed and the problem changed its place, so I changed that
>> patch to working_tree.extras()
>>
> 
>