<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=UTF-8" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> John Arbash Meinel escreveu: <blockquote cite="mid46B9DAF3.6020406@arbash-meinel.com" type="cite"> <pre wrap="">-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin Pool wrote: </pre> <blockquote type="cite"> <pre wrap="">On 8/7/07, Fabio Machado de Oliveira <a class="moz-txt-link-rfc2396E" href="mailto:absfabio@terra.com.br"><absfabio@terra.com.br></a> wrote: </pre> <blockquote type="cite"> <pre wrap="">Hi again Martin, I found that "bzr pull" also have a problem with the existance of unversioned files with invalid filenames, and I expect it to happen with many other commands. I am wondering if its a case of replacing all of the "os.listdir" with something that already exclude these files, but I think it could have some performance decrease, as there is a utf8 encoding cache that would probably lose part or all of its performance gain. Or if the patches I submitted are going in the right way, so I will wait for someone to review that patch before trying to continue. </pre> </blockquote> <pre wrap="">I think rather than replacing the calls individually, you probably want to put access to workingtree files under the control of the workingtree so that this policy is centralized. I think it would be nice if files with invalid/unrepresentable names were not seen outside of the workingtree. We need to decide just what should happen to files with invalid names. Should they just be ignored entirely, or should we give the user some kind of notification. I think a good tradeoff would be: 1 - if they explicitly name the file, give an error 2 - if it's just unknown or ignored, ignore it I think we can accomplish that by 1- when a filename is given, if we can't decode it on the command line, or can't convert it into the fsencoding, error 2- otherwise, when listing the workingtree, skip files that can't be decoded. Not totally sure though... </pre> </blockquote> <pre wrap=""> It would be nice if we could warn if the file is 'unknown' (not ignored, not versioned) and cannot be interpreted. (It obviously can't be versioned.). My idea is that you could ignore it, by using an appropriate regex which leaves out those characters. So to ignore "fo\xff\xff" you could ignore "fo??". Or something like that. I should also chime in a bit on implemention information. Python os.listdir() has the api that if you pass a Unicode string, you get back Unicode paths. However, if you pass a Unicode string, and the paths cannot be represented, they come back as 8-bit strings. So actually, one way to detect bad filenames is to do: for path in os.listdir(u'.'): if isinstance(path, str): # This cannot be represented as Unicode ... However, our walkdirs_utf8 code doesn't do this. Specifically because converting every path we encounter to Unicode is slower than we would like. So we have _walkdirs_utf8 which is designed such that if the filesystem is (theoretically) utf-8 encoded, we just return the paths 'as is'. So we have to do the detection later. Ultimately, I don't think we want a os.listdir() that returns utf-8 paths. I think catching it at an appropriate time (during _iter_changes, etc) is fine. (Note that _iter_changes doesn't know whether files are ignored or unknown, just that they are not versioned.) John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - <a class="moz-txt-link-freetext" href="http://enigmail.mozdev.org">http://enigmail.mozdev.org</a> iD8DBQFGudrzJdeBCYSNAAMRAlLNAJ9fnv1Ajo6GISSaljelh0AUuszEWgCgtSxa JaULchzNtviXjjR7f9oA0p8= =bwOL -----END PGP SIGNATURE----- </pre> </blockquote> There are some sorted(os.listdir()) that fails, where I used a function for filtering invalid filenames. Now I think I need to replace that with an: list = os.listdir(...) try: sorted_list = sorted(list) except UnicodeDecodeError: sorted_list = sorted(filter_invalid_filenames(list)) And change the trace.warning that I used to replace the invalid chars with question marks. In the tests, the way I mixed make_branch_and_tree with run_bzr doesnt seem right, how do I call "bzr status" from the api? Fábio </body> </html>