[MERGE][BUG #77533][RFC] ignore files with invalid filenames

Fabio Machado de Oliveira absfabio at terra.com.br
Wed Aug 8 18:14:39 BST 2007


Fábio Machado de Oliveira escreveu:
> John Arbash Meinel escreveu:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Martin Pool wrote:
>>   
>>> On 8/7/07, Fabio Machado de Oliveira <absfabio at terra.com.br> wrote:
>>>     
>>>> Hi again Martin,
>>>>
>>>> I found that "bzr pull" also have a problem with the existance of
>>>> unversioned files with invalid filenames, and I expect it to happen
>>>> with many other commands.
>>>>
>>>> I am wondering if its a case of replacing all of the "os.listdir"
>>>> with something that already exclude these files, but I think it could
>>>> have some performance decrease, as there is a utf8 encoding cache that
>>>> would probably lose part or all of its performance gain.
>>>>
>>>> Or if the patches I submitted are going in the right way, so I will wait
>>>> for someone to review that patch before trying to continue.
>>>>       
>>> I think rather than replacing the calls individually, you probably
>>> want to put access to workingtree files under the control of the
>>> workingtree so that this policy is centralized.
>>>
>>> I think it would be nice if files with invalid/unrepresentable names
>>> were not seen outside of the workingtree.
>>>
>>> We need to decide just what should happen to files with invalid names.
>>>  Should they just be ignored entirely, or should we give the user some
>>> kind of notification.  I think a good tradeoff would be:
>>>
>>> 1 - if they explicitly name the file, give an error
>>> 2 - if it's just unknown or ignored, ignore it
>>>
>>> I think we can accomplish that by
>>>
>>> 1- when a filename is given, if we can't decode it on the command
>>> line, or can't convert it into the fsencoding, error
>>> 2- otherwise, when listing the workingtree, skip files that can't be decoded.
>>>
>>> Not totally sure though...
>>>
>>>     
>>
>> It would be nice if we could warn if the file is 'unknown' (not ignored,
>> not versioned) and cannot be interpreted. (It obviously can't be
>> versioned.).
>>
>> My idea is that you could ignore it, by using an appropriate regex which
>> leaves out those characters. So to ignore "fo\xff\xff" you could ignore
>> "fo??". Or something like that.
>>
>> I should also chime in a bit on implemention information.
>>
>> Python os.listdir() has the api that if you pass a Unicode string, you
>> get back Unicode paths. However, if you pass a Unicode string, and the
>> paths cannot be represented, they come back as 8-bit strings.
>>
>> So actually, one way to detect bad filenames is to do:
>>
>> for path in os.listdir(u'.'):
>>   if isinstance(path, str):
>>     # This cannot be represented as Unicode
>>     ...
>>
>>
>> However, our walkdirs_utf8 code doesn't do this. Specifically because
>> converting every path we encounter to Unicode is slower than we would
>> like. So we have _walkdirs_utf8 which is designed such that if the
>> filesystem is (theoretically) utf-8 encoded, we just return the paths
>> 'as is'. So we have to do the detection later.
>>
>> Ultimately, I don't think we want a os.listdir() that returns utf-8
>> paths. I think catching it at an appropriate time (during _iter_changes,
>> etc) is fine. (Note that _iter_changes doesn't know whether files are
>> ignored or unknown, just that they are not versioned.)
>>
>> John
>> =:->
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.6 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFGudrzJdeBCYSNAAMRAlLNAJ9fnv1Ajo6GISSaljelh0AUuszEWgCgtSxa
>> JaULchzNtviXjjR7f9oA0p8=
>> =bwOL
>> -----END PGP SIGNATURE-----
>>
>>
>>   
> 
> There are some sorted(os.listdir()) that fails, where I used a function 
> for filtering
> invalid filenames.
> 
> Now I think I need to replace that with an:
> 
> list = os.listdir(...)
> try:
>     sorted_list = sorted(list)
> except UnicodeDecodeError:
>     sorted_list = sorted(filter_invalid_filenames(list))
> 
> And change the trace.warning that I used to replace the invalid chars 
> with question
> marks.
> 
> In the tests, the way I mixed make_branch_and_tree with run_bzr doesnt 
> seem right,
> how do I call "bzr status" from the api?
> 
> Fábio

I changed the patch to do what I said.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: invalid_filenames.patch
Type: text/x-patch
Size: 15564 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070808/37325ffe/attachment-0002.bin 


More information about the bazaar mailing list