[PATCH][MERGE] Improvements to is_ignored

John Arbash Meinel john at arbash-meinel.com
Thu Jan 12 14:59:44 GMT 2006


Jan Hudec wrote:
> On Wed, Jan 11, 2006 at 15:27:36 -0600, John Arbash Meinel wrote:
> 
>>Jan Hudec wrote:
>>
>>>>Thanks. It would also be good to write some tests for matching behavior,
>>>>to make sure that the (?:.*/) pattern always matches. I think paths
>>>>always start with at least ./, but I won't guarantee that without some
>>>>tests.
>>>
>>>I will look into it later when I try to implement better pattern
>>>translation.
>>
>>How are you thinking to do it? I don't know if you saw my posts, as I
>>have implemented one, with a bunch of tests in my encoding branch. At
>>the very least you might want to grab my test cases.
> 
> 
> Thanks. I will look at that.
> 
> 
>>>Well, the important thing is, that it should work to have a pattern
>>>like:
>>>*.fň
>>>and have files with that extension and they should match. And they
>>>should continue to match if I branch say from iso-8859-2 system to a
>>>utf-8 one.
>>
>>In my testing, if you do
>>
>>>>>x = re.compile(u'fň')
>>>>>x.match(u'fň') is not None
>>
>>True
>>
>>So you don't have to use re.UNICODE. It only changes the meaning of
>>escape characters. We create utf-8 encoded entries in .bzrignore.
>>Probably this means we shouldn't be telling people to 'echo foo >>
>>.bzrignore' since 'foo' is likely to be in user encoding not in utf-8.
> 
> 
> Hm. According to the documentation, (?u) is global. I thought it would
> be group-local as it is in perl.
> 
> Hm, python does not seem to have the \P{...} escape. That's not exactly
> good, because it means we are limited to the properties it can do.
> 
> I am leaning towards using re.UNICODE, because one can always write
> [A-Za-z0-9_], so the [[:alnum:]] one should mean unicode letters.
> 

Except we are really only creating matches from file globs. Which means
we don't have that level of granularity. All we really have is
* - match a string
? - match any character
[] - select from this list
[!] - select not from this list

We might add
** - Match including directories, and change * to not match directories

So we don't have to worry specifically about re.UNICODE or not. (at
least in these cases).

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060112/1ef52aac/attachment.pgp 


More information about the bazaar mailing list