[MERGE][BUG #77533][RFC] ignore files with invalid filenames

Thu Aug 9 17:43:56 BST 2007

Martin Pool escreveu:
> Martin Pool has voted resubmit.
> Status is now: Resubmit
> Comment:
> +import bzrlib.tests.workingtree_implementations
> +
> +class 
> TestInvalidFilenames(bzrlib.tests.workingtree_implementations.TestCaseWithWorkingTree): 
> 
> +    def test_invalid_filenames(self):
> +        """Test that bzr status wont break with an invalid filename."""
> 
> You need another blank line before the class, and one before the first
> method.

Done

> 
> The fully qualified classname is a bit long there; I'd just do "from ..
> import TestCaseWithWorkingTree".
> 
> +        except:
> +            cant_test = False
> +        if cant_test:
> +            raise bzrlib.tests.TestSkipped("The invalid filename is 
> valid for this filesystem.")
> 
> You should be catching just UnicodeDecodeError, not every error.
> it would be simpler to just put the raise inside the except block.

Change the except. The raise cannot be inside the except block, because
it happens when that block is not evaluated. I want to make sure that
the file system encoding will throw an exception for that filename.

> 
> +        file(invalidfilename, 'wb').write('foo')
> 
> You should use self.build_tree here and below to make sure it's closed
> properly.

build_tree didn't accept the invalid filename. I changed it to close the file.

> 
> +        self.run_bzr('add')
> 
> I guess testing this at the level of the command line is ok, but it seems
> like you should be looking at the output and making sure it's giving the
> right results.  I suppose just making sure that it does not crash is an
> improvement.

Changed to test the results.

> 
> +def filter_invalid_filenames(listdir):
> 
> This should have a docstring, and probably wants a specific unit test in
> test_osutils.  Possibly that should actually return two lists: valid and
> invalid filenames, so we can defer the policy that we're going to give
> warnings about them to a somewhat higher layer.
> 
> You need to document whether the returned lists are utf-8 or unicode.
> 
> Also since both calls to this are running it in the same way,
> I think you want another function that lists a directory
> and returns a sorted list of valid filenames.
> 
> Again there should be two blank lines between top-level declarations,
> and one line between declarations inside a class.

I put that comment that the other functions start with. I'm not sure if
that is the docstring you mention.

Now I have a list_valid_filenames and sorted_list_valid_filenames, both returns
tuples with (valid, invalid) filenames.

The blank lines are done too.

>          # directory file_id, relative path, absolute path, reverse 
> sorted children
>          children = os.listdir(self.basedir)
> -        children.sort()
> +        try:
> +            children.sort()
> +        except UnicodeDecodeError:
> +            children = osutils.filter_invalid_filenames(children)
> +            children.sort()
> 
> It seems odd that you look for decode errors during the sort.
> I think you should just call the list_valid_filenames() method I
> describe above.

It's done. I try to sort os.listdir() direclty first because I dont
want people with valid directories to lose performance.

> 
>                  if subf not in dir_entry.children:
> -                    subf_norm, can_access = 
> osutils.normalized_filename(subf)
> -                    if subf_norm != subf and can_access:
> -                        if subf_norm not in dir_entry.children:
> -                            fl.append(subf_norm)
> +                    try:
> +                        if isinstance(subf, str):
> +                            subf = subf.decode(osutils._fs_enc)
> +                        subf_norm, can_access = 
> osutils.normalized_filename(subf)
> +                        if subf_norm != subf and can_access:
> +                            if subf_norm not in dir_entry.children:
> +                                fl.append(subf_norm)
> +                    except UnicodeDecodeError:
> +                        trace.warning('Invalid filename: ' + 
> subf.decode(_fs_enc, 'replace'))
>                      else:
>                          fl.append(subf)
> 
> and here you can just replace the listdir preceding this hunk with
> the filtered version.

Done

> 
> +                                try:
> +                                    yield (None,
> +                                        (None, 
> utf8_decode(current_path_info[0])[0]),
> +                                        True,
> +                                        (False, False),
> +                                        (None, None),
> +                                        (None, 
> utf8_decode(current_path_info[1])[0]),
> +                                        (None, current_path_info[2]),
> +                                        (None, new_executable))
> +                                except UnicodeDecodeError:
> +                                    bzrlib.trace.warning('Invalid 
> filename: ' + current_path_info[0].decode('utf8', 'replace'))
>                              # dont descend into this unversioned path 
> if it is
> 
> But this might be wrong if it's actually something else that couldn't be
> decoded.  That seems unlikely as it should only be the current disk value
> that's ever wrong, but still it's confusing when an error message is 
> wrong.  I
> think you should instead utf8_decode the two paths into temporary variables
> before building the tuple.  

Now its done first in a temporary variable.

 > Should we instead be making sure that invalid
> names never get into the dirstate?

I believe that it would be hard to do without going against what
John Meinel said:

“However, our walkdirs_utf8 code doesn't do this. Specifically because
converting every path we encounter to Unicode is slower than we would
like. So we have _walkdirs_utf8 which is designed such that if the
filesystem is (theoretically) utf-8 encoded, we just return the paths
'as is'. So we have to do the detection later.

Ultimately, I don't think we want a os.listdir() that returns utf-8
paths. I think catching it at an appropriate time (during _iter_changes,
etc) is fine. (Note that _iter_changes doesn't know whether files are
ignored or unknown, just that they are not versioned.)”

I also included a test and fix for an invalid directory name.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: invalid_filenames.patch
Type: text/x-patch
Size: 20348 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070809/0c3e64ea/attachment-0002.bin