[PATCH][MERGE] Improvements to is_ignored

Wed Jan 11 15:37:11 GMT 2006

On Wed, Jan 11, 2006 at 07:00:19 -0600, John Arbash Meinel wrote:
> Jan Hudec wrote:
> > On Tue, Jan 10, 2006 at 22:35:07 -0600, John A Meinel wrote:
> >>>>I also found an interesting problem if you don't use (?:), specifically:
> >>>>bzr: ERROR: exceptions.AssertionError: sorry, but this version only
> >>>>supports 100 named groups
> >>>>  at /usr/lib/python2.4/sre_compile.py line 506
> >>>>  in compile
> > 
> > 
> > Ouch.
> > 
> > Well, at least we need to call is_ignored before calling is_ignored_by, so we
> > don't iterate over anything unless when we are not going to find anything
> > (especially since that is the worst case).
> > 
> 
> Sure. I think most code doesn't call is_ignored_by until they have an
> idea it is ignored.
> But I would definitely at some point call is_ignored, before iterating
> over the is_ignored_by patterns.

Well, there are just 2 users of is_ignored_by. One is smart_add_tree and
the other is cmd_ignored. So I'll make sure they only call is_ignored_by
on ignored entries. Maybe I'll rename the method and specify it must not
be called on non-ignored entries.

> >>>So we should add a test with enough patterns to provoke this, calling
> >>>both bzr status and bzr ignored.
> > 
> > Ok, I'll look at it tomorrow.
> 
> Thanks. It would also be good to write some tests for matching behavior,
> to make sure that the (?:.*/) pattern always matches. I think paths
> always start with at least ./, but I won't guarantee that without some
> tests.

I will look into it later when I try to implement better pattern
translation.

> (I also can't guarantee that paths don't have \, but at that point they
> shouldn't have \.)
> I would also do a check for:
> if '\\' in pat:
>   pat = pat.replace('\\', '/')
> 
> Because I do believe the file paths have been normalized. (fnmatch might
> translate it correctly, but I don't think it does).

Fnmatch sucks a big time, unfortunately. It certainly does not translate
anything.

> By the way, in general I like what you've done. I'm not sure if we want
> to add: re.UNICODE to the re.compile() command. It supposedly only
> changes the meaning of \w, etc, so it may not be necessary.

Well, the important thing is, that it should work to have a pattern
like:
*.fň
and have files with that extension and they should match. And they
should continue to match if I branch say from iso-8859-2 system to a
utf-8 one.

> I ended up looking deeper, and found that the problem with fnmatch() not
> matching unicode filenames is that we encode('utf-8') when we write
> .bzrignore, but we don't decode() when we read it.

The question is how will the re engine deal with unicode patterns and
that all the manipulation has to properly use unicode strings to avoid
the screwed python default conversion. I'll at least make sure of the
later.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/6da60cfd/attachment.pgp