[MERGE][BUG #77533][RFC] ignore files with invalid filenames

Thu Aug 9 04:08:36 BST 2007

Martin Pool has voted resubmit.
Status is now: Resubmit
Comment:
+import bzrlib.tests.workingtree_implementations
+
+class 
TestInvalidFilenames(bzrlib.tests.workingtree_implementations.TestCaseWithWorkingTree):
+    def test_invalid_filenames(self):
+        """Test that bzr status wont break with an invalid filename."""

You need another blank line before the class, and one before the first
method.

The fully qualified classname is a bit long there; I'd just do "from ..
import TestCaseWithWorkingTree".

+        except:
+            cant_test = False
+        if cant_test:
+            raise bzrlib.tests.TestSkipped("The invalid filename is 
valid for this filesystem.")

You should be catching just UnicodeDecodeError, not every error.
it would be simpler to just put the raise inside the except block.

+        file(invalidfilename, 'wb').write('foo')

You should use self.build_tree here and below to make sure it's closed
properly.

+        self.run_bzr('add')

I guess testing this at the level of the command line is ok, but it 
seems
like you should be looking at the output and making sure it's giving the
right results.  I suppose just making sure that it does not crash is an
improvement.

+def filter_invalid_filenames(listdir):

This should have a docstring, and probably wants a specific unit test in
test_osutils.  Possibly that should actually return two lists: valid and
invalid filenames, so we can defer the policy that we're going to give
warnings about them to a somewhat higher layer.

You need to document whether the returned lists are utf-8 or unicode.

Also since both calls to this are running it in the same way,
I think you want another function that lists a directory
and returns a sorted list of valid filenames.

Again there should be two blank lines between top-level declarations,
and one line between declarations inside a class.

          # directory file_id, relative path, absolute path, reverse 
sorted children
          children = os.listdir(self.basedir)
-        children.sort()
+        try:
+            children.sort()
+        except UnicodeDecodeError:
+            children = osutils.filter_invalid_filenames(children)
+            children.sort()

It seems odd that you look for decode errors during the sort.
I think you should just call the list_valid_filenames() method I
describe above.

                  if subf not in dir_entry.children:
-                    subf_norm, can_access = 
osutils.normalized_filename(subf)
-                    if subf_norm != subf and can_access:
-                        if subf_norm not in dir_entry.children:
-                            fl.append(subf_norm)
+                    try:
+                        if isinstance(subf, str):
+                            subf = subf.decode(osutils._fs_enc)
+                        subf_norm, can_access = 
osutils.normalized_filename(subf)
+                        if subf_norm != subf and can_access:
+                            if subf_norm not in dir_entry.children:
+                                fl.append(subf_norm)
+                    except UnicodeDecodeError:
+                        trace.warning('Invalid filename: ' + 
subf.decode(_fs_enc, 'replace'))
                      else:
                          fl.append(subf)

and here you can just replace the listdir preceding this hunk with
the filtered version.

+                                try:
+                                    yield (None,
+                                        (None, 
utf8_decode(current_path_info[0])[0]),
+                                        True,
+                                        (False, False),
+                                        (None, None),
+                                        (None, 
utf8_decode(current_path_info[1])[0]),
+                                        (None, current_path_info[2]),
+                                        (None, new_executable))
+                                except UnicodeDecodeError:
+                                    bzrlib.trace.warning('Invalid 
filename: ' + current_path_info[0].decode('utf8', 'replace'))
                              # dont descend into this unversioned path 
if it is

But this might be wrong if it's actually something else that couldn't be
decoded.  That seems unlikely as it should only be the current disk 
value
that's ever wrong, but still it's confusing when an error message is 
wrong.  I
think you should instead utf8_decode the two paths into temporary 
variables
before building the tuple.  Should we instead be making sure that 
invalid
names never get into the dirstate?

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C46B9F9FF.1070206%40terra.com.br%3E