[MERGE/RFC] Content filters - take 6

Tue Feb 24 08:54:17 GMT 2009

Martin Pool wrote:
> 2009/2/21 John Arbash Meinel <john at arbash-meinel.com>:

>>> If the filters are asymmetrical, then I think we may get the
>>> following situation: when the tree is constructed it'll be given the
>>> canonical hashes, but if we re-hash a file we'll see the hash that
>>> comes from input filtering on the working copy.  That might be
>>> reasonable if we saw them as changed when the user specifically
>>> touched those files, but the problem is that we'll sometimes re-hash
>>> them because the original timestamp was not safe.
>> Do we want to support asymmetrical filters? It seems like we should only
>> really allow filters where "in_filt(out_filt(bytes)) == bytes". Or do
>> you have some otehr meaning here?
> 
> Well he does document later in the patch that such filters are allowed.
> 
> If we're going to run arbitrary code for them it seems a bit hard to
> forbid them.  And anyhow, there are interesting plausible cases, such
> as normalizing something on checkout but always committing precisely
> what was written.
> 
> A filter that makes all files modified whenever you check out would be
> strange I agree, but I'd like to at least know how it should work.
> 
>>> Asymmetric filters are not the common case but I hoped considering
>>> those cases would shed light on how it should work.
>> IMO, if you have an asymmetric filter, you have just modified the whole
>> working tree.
> 
> Agree - or at least, all the files for which it produced an asymmetric result.

I think asymmetric filters must be supported. It's easy to come up with
cases where they are useful, e.g. one for removing trailing whitespace on
commit. (That would certainly beat the current situation where PQM falls
over on a failed test_coding_style test.)

Furthermore, most filters will be supplied by plugins and we have no
control over whether such filters will be symmetric or not. It would
be extremely hard to enforce this, e.g. testing on some "standard"
dummy data isn't going to help because most filters, by their nature,
are very data dependent.

Ian C.