Content filter sample: eol plugin

John Arbash Meinel john at arbash-meinel.com
Mon May 19 16:22:37 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
| On Sat, May 17, 2008 at 1:43 AM, Ian Clatworthy
| <ian.clatworthy at internode.on.net> wrote:
|> For those interested, I've attached the README.txt and core code for the
|> upcoming eol (end of line) plugin. This shows two important things:
|>
|> 1. How easy it is to write and register content filters.
|>
|> 2. How rule-based properties are used to configure it.
|>
|> DO NOT RUN THIS code and tell me it's broken! I'll make a proper branch
|> or patch (if we think it ought to be a standard plugin) available soon
|> once I've tested it further.
|>
|> That warning aside, I hope people find it interesting. :-)
|>
|> Ian C.
|>
|> bzr-eol: End Of Line conversion plugin
|> ======================================
|>
|> Overview
|> --------
|>
|> This plugin adds EOL conversion to selected files. This makes it easier to
|> work on text files in projects where developers are on multiple platforms or
|> developing for multiple platforms.
|>
|>
|> Installation
|> ------------
|>
|> The easiest way to install this plugin is to either copy or symlink the
|> directory into your ~/.bazaar/plugins directory. Be sure to rename the
|> directory to eol (instead of bzr-eol).
|>
|> See http://bazaar-vcs.org/UsingPlugins for other options such as
|> using the BZR_PLUGIN_PATH environment variable.
|>
|>
|> Testing
|> -------
|>
|> To test the plugin after installation:
|>
|>    bzr selftest eol
|>
|>
|> Documentation
|> -------------
|>
|> EOL conversion is provided as a content filter where Bazaar internally
|> stores a canonical format but outputs a convenience format. See
|> ``bzr help content-filters`` for general information about using these.
|>
|> EOL conversion needs to be enabled for selected branches and files using
|> rules. Default rules for all branches can be defined in ``bazaar.rules``.
|> Alternatively, branch-specific rules can be defined in ``branch.rules``.
|> See ``bzr help rules`` for general information on defining rules.
|>
|> To configure which files to filter, set ``eol`` to one of the values below.
|>
|>  ====== ======================= ========================================
|>  Value  Commit newlines as      On checkout, convert newlines to
|>  ====== ======================= ========================================
|>  exact  exactly as in file      No conversion
|>  ------ ----------------------- ----------------------------------------
|>  native lf                      crlf on Windows, otherwise lf
|>  ------ ----------------------- ----------------------------------------
|>  dos    crlf                    crlf on Windows, otherwise lf
|>  ====== ======================= ========================================
|>
|> Note: For safety reasons, no conversion is applied to any file where a null
|> character is detected in the file.
|
| I think we should choose _either_ to call the modes by the line ending
| convention (crlf) or by the platform where they are typically used
| (dos) but not some of one and some of the other.  Naming by platform
| is probably more recognizable, but does mean there's no obvious name
| for just "cr" ("old mac"?)
|
| It seems  to me what we want is
|
|   exact -- no changes; default
|   native -- produce whatever is the normal convention for the platform
| where the checkout was made; internally always stored with just lfs --
| this is suitable for files in cross-platform projects that will work
| with either linending
|   cr -- should always have cr
|   lf -- suitable for eg unix shell scripts that will be broken by
| carriage returns
|   crlf -- suitable for dos batch files that need carriage returns
|
| I can see room for another option that says what to do if the working
| copy is not as expected, or if it has mixed line endings - it could
| either do nothing, warn, or error.  That could be added later of
| course.

Warning when encountering a mixed-line ending seems nice, as you almost never
want it. However, I'm not sure if it is strictly needed. If we break it down as:

exact	- Certainly we don't want to warn here, since .PNG might have anything
native	- Maybe, as doing 'rm foo; bzr revert foo' will change the line endings
~          of the file. Heck, potentially just 'bzr revert foo' will change the
~          file. Even if 'bzr commit' wouldn't have anything to commit.
cr/lf/crlf - Probably here as well.

It seems fine to have the cr/lf/crlf check on commit if they actually had to
change anything in the file. If they did, warn that the line endings are
incorrect, and prompt for 'bzr revert'. Or whatever will actually effect the
file. (rm foo; bzr revert foo; etc.)

I *am* a little curious how 'bzr revert' is going to interact with content
filtering. Because 'bzr status' may not want to show that these files are
modified, and certainly 'bzr commit' wouldn't want to mark them as changed. But
should 'bzr revert' clean up the line endings?

Do we need a new "iter_changes(include_content_filter_unclean=True)". It might
be reasonable to extend the content filter code so that the filters could give a
hint as to whether there input was clean.

(eg, native would convert LF => LF, and CRLF => LF, but if it found a plain LF
when it was expecting CRLF it could set the 'unclean_input' flag.)


This sort of complexity is what has caused us to avoid it in the first place, so
maybe we can just take the limited answer other programs have offered, and just
run with it.

|
|> Here is the suggested rule for users on Windows working on a cross-platform
|> project::
|>
|>  [*]
|>  eol = native
|
| I see what you mean but that is a somewhat risky setting; I don't know
| if anything is really safe for *.
|
|> If you think it would be clearer, I'll rename "native" to "unix".
|
| But this sounds really odd to me - surely "native" is only "unix" if
| you are actually on unix?
|

Well, it was as opposed to 'dos'. And it basically changed the flag to mean
"what is stored internally" rather than "what is presented on disk". Which I
disagree with. I can understand a "I'm primarily developing on Windows, why
should it have to filter at every commit, and then on every checkout", but I'm
guessing it isn't a lot of overhead. And it results in a simpler configuration.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgxmz0ACgkQJdeBCYSNAAPEkACfbh2GVAHBCBlM1fZ3MJF2fiBO
+vkAoLLYkGuJBzut9Lx4LByQObc1SDeK
=iR8o
-----END PGP SIGNATURE-----



More information about the bazaar mailing list