Update on EOL progress; RFI on UI for general case

Fri May 2 02:23:53 BST 2008

I'm working on a new feature for Bazaar called "working tree content
filtering" (WTCF) and I'd like some input on what people expect. This
feature is the basis for end-of-line conversion. The intention though is
to generalise the design to support "arbitrary" input and output
processing of content in the working tree.

See below for a quick brain dump on some of the issues I'm currently
working through, together with some feedback from Robert. In a nutshell,
the point is that a user will be able to change (via configuration
settings initially) what filters are to be applied and Bazaar then needs
the necessary intelligence to Do The Right Thing. That may involve
changing existing commands, extending existing commands or adding new
ones. Some examples ...

1. You have a working tree with Unix line endings and you change your
setup so that *.txt files ought to have DOS line endings. After changing
that setting, what command do you expect to run to "refresh" the working
tree? (Note that 'bzr status' won't show anything as out-of-date in this
case because the right content is already committed.)

2. You've installed a plugin that strips trailing whitespace on commit
of *.py files. You run 'bzr st' and it shows the files that will be
committed (because the sha of the post-filtered content won't match that
of the basis tree content for the files in question). You run 'bzr
commit' and the right content is committed. So far, so good. Should 'bzr
commit' implicitly remove the trailing whitespace in the files in the
working tree or should it leave the user's files alone? (If you checkout
that branch to another location after the commit, the trailing
whitespace is gone of course. But that's different to in-situ updating
of files as commit finishes, and the potential for conflicts that implies.)

To generalise, I think the input filtering cases are pretty straight
forward - it's the updating of the working tree that gets ugly. Once
output content filtering exists, the working tree can have "outdated"
files even though 'bzr st' will correctly think their sha is ok. Maybe
'bzr update' ought to find and refresh them but how does it do that
quickly and how does it detect "outdated" vs "user made changes"? I'm
not sure if 'bzr revert' is the answer or not. Would a user expect to
use it in this case? (Either way, we need to define its behaviour.)

If you have an opinion on the above, now is a good time to speak up!

Ian C.

-------- Original Message --------
Subject: Re: Might not be around for the stand-up call this Friday
Date: Fri, 02 May 2008 09:48:31 +1000
From: Robert Collins <robertc at robertcollins.net>
To: Ian Clatworthy <ian.clatworthy at internode.on.net>
CC: Martin Pool <mbp at sourcefrog.net>, John Arbash Meinel
<john at arbash-meinel.com>, Andrew Bennetts <andrew at canonical.com>
References: <4819CC28.1060707 at internode.on.net>

On Thu, 2008-05-01 at 23:56 +1000, Ian Clatworthy wrote:
> Hi guys,
> 
> I need to drop the kids off (and pick them up) Friday so I may not be
> home in time for the daily call. If I am, I'll jump on IRC and let you know.
> 
> FWIW, today was all about EOL support. I'm still not ready to put up a
> content filtering patch because some important tests I've been adding
> don't work yet. In particular, the case where a filter-pair has a write
> filter that changes content doesn't update the working tree when it should.
> 
> For example, imagine a filter pair that converts to uppercase on input
> and lowercase on output. The input side is good. The output side needs
> to take effect at times, implicitly and/or explicitly, e.g.:

> * just on files changed at the end of a commit

???

> * after running update?
> * after running revert?
> 
> Imagine if I change the output rules in a config file. If I run update,
> it ought to do something I'm thinking. 

I don't think update/pull/push are good places to handle this case. -
running shelve + revert is a reasonable idiom to 'correct' a tree that
hasn't had the right filter applied to it, but we'll need to fix revert
in some manner to allow it to detect unfiltered output texts.

> It currently doesn't. Running
> revert probably ought to do something intelligent as well. (It might now
> but I suspect not.) I'm wondering whether a --refresh option is needed
> on one or more commands?

There are two different cases here. Adding a filter and removing one. By
definition adding a filter won't change the canonical text hash held in
the dirstate. (because what is on disk will have been the content from
the repository). Removing a filter will however have the potential to
change the canonical text hash (because what is on disk has been
filtered, and the tree no longer knows how to undo that filtering).

In the former case we want to inspect all files modified or not and
check that their *filtered* content is correct. Unless we modify
dirstate to hold two hashes, the only way to do this is to extract every
file from the repository that will be filtered, filter them, and then
compare to the file on disk to decide if it is different - if it is
different, then if the file on disk is the repositories canonical
version, replace with the filtered version, otherwise merge and put
conflict markers in.

In the latter case just ignoring the hash cache will result in every
file that was previously filtered showing up as different. We have a
significant problem correcting this though, because unlike the case of
an added filter, we can't tell 'edited' from 'filtered'.

> Anyhow, I'm progressing the 'assign properties using globs' patch while
> I'm mulling over the stuff above. It's independent and also necessary
> before I glue Alexander's filters over the top to pull eol support
> together. It's going well so far ...

Cool.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080502/75ccf216/attachment.pgp