[MERGE/RFC] Content filters - take 6
John Arbash Meinel
john at arbash-meinel.com
Fri Feb 20 14:57:30 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
> If there are no filters then it should be storing just the same hash
> that it does at present; we'd want to check that it's not doing so
> inefficiently.
>
> If there are filters and they're symmetric then when we check the
> working file we'll convert it back to the original form, and see
> that it hasn't changed.
>
> If the filters are asymmetrical, then I think we may get the
> following situation: when the tree is constructed it'll be given the
> canonical hashes, but if we re-hash a file we'll see the hash that
> comes from input filtering on the working copy. That might be
> reasonable if we saw them as changed when the user specifically
> touched those files, but the problem is that we'll sometimes re-hash
> them because the original timestamp was not safe.
Do we want to support asymmetrical filters? It seems like we should only
really allow filters where "in_filt(out_filt(bytes)) == bytes". Or do
you have some otehr meaning here?
>
> I'd like to check in to how the hashes are set when the working tree
> is first built.
>
> So this could lead to the strange situation where if you get a
> checkout with asymmetrical filters then immediately run status, some
> of the files are shown as modified and some are not. I'm not sure
> what should happen there, but consistently showing them all as
> modified is probably the best thing.
>
> Asymmetric filters are not the common case but I hoped considering
> those cases would shed light on how it should work.
IMO, if you have an asymmetric filter, you have just modified the whole
working tree.
...
> +def filtered_output_bytes(chunks, filters, context=None):
> + """Convert byte chunks from internal to external format.
> +
> + :param chunks: an iterator containing the original content
> + :param filters: the stack of filters to apply
> + :param context: a ContentFilterContext object passed to
> + each filter
> + :return: an iterator containing the content to output
> + """
> + if filters:
> + for filter in reversed(filters):
> + if filter.writer is not None:
> + chunks = filter.writer(chunks, context)
> + return chunks
>
> This would be a bit simpler without the 'if' statement, just letting
> the loop do nothing...
I suppose it depends if "filters" can be None rather than an empty list.
...
>
...
> - def get_file_with_stat(self, file_id, path=None, _fstat=os.fstat):
> + def get_file_with_stat(self, file_id, path=None, filtered=True,
> + _fstat=os.fstat):
> """See MutableTree.get_file_with_stat."""
> if path is None:
> path = self.id2path(file_id)
> - file_obj = self.get_file_byname(path)
> - return (file_obj, _fstat(file_obj.fileno()))
> -
> - def get_file_byname(self, filename):
> - return file(self.abspath(filename), 'rb')
> -
> - def get_file_lines(self, file_id, path=None):
> + file_obj = self.get_file_byname(path, filtered=False)
> + stat_value = _fstat(file_obj.fileno())
> + if self.supports_content_filtering() and filtered:
> + filters = self._content_filter_stack(path)
> + file_obj = filtered_input_file(file_obj, filters)
> + return (file_obj, stat_value)
>
> This method get_file_with_stat seems a bit dangerous because the
> size in the stat object may be inconsistent with what's actually
> returned; I'd like to see if we could deprecate this method.
The reason to have it was to have a "non-racing" implementation to get
the content and the stat. Otherwise if you read at a separate time than
you stat, something could have changed in the mean-time.
I believe the stat is only really used to pass back to the dirstate to
have it mark the file as "unchanged since last commit".
>
> So to sum up: I think we should move forward with this and I don't
> see anything actually that's actually buggy. It would be very nice
> if we could do this in a manner entirely separate from the tree,
> either in the current format or in a future dirstate update, and it
> seems just about possible. But perhaps the right way to go towards
> that is to merge it, and then pick it up.
>
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmexNoACgkQJdeBCYSNAAOhOACfTQfqb3LsBsNX4DQmzwSMqxt1
zpsAoMA0FK12KECIxAPzn0u2V+UsTJW5
=ugmy
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list