[MERGE/RFC] Working tree content filtering

Fri Apr 18 08:36:44 BST 2008

Ian Clatworthy wrote:
> John Arbash Meinel wrote:
[...]
> Having spent time today trying to write a filter, I now think ...
> 
>   text = f.read()
> 
> is definitely better. Asking filter writers to process a sequence of
> chunks makes their life *much* harder. With the exception of filters
> that match a *single* character with no other context, the filters
> basically need to always do
> 
>   ''.join(chunks)

I think just dealing with the full file all at once is the right approach, at
least for the first implementation of this feature.  We already buffer entire
file texts in memory.  We'd like to move away from that, but we're not there
yet.  When we do start making a serious effort to reduce our memory footprint
when dealing with large files, I expect that would be a good time to consider a
more efficient chunking/streaming API for content filters.  Presumably the
general work would have an impact on how best to do it here.  Doing it now is
premature IMO.

-Andrew.