[MERGE] integrated EOL conversion support

John Yates jyates at netezza.com
Mon Mar 30 18:36:07 BST 2009


First let me start by saying that I am not sure I follow completely all of the intricacies of the current model.  My hunch is that the complexity in the current model comes from a focus on the need to accommodate existing repositories.  As such it represents an instance of "the tail wagging the dog".  Such lack of obviousness bodes poorly for less sophisticated users and projects not yet under bzr.

I recommend thinking about how one would design this feature if it had been included from day one.  That ought to be the (initial, simplified) model presented to those who have not painted themselves into a corner by already having repositories with awkward line endings.  With that simple, clear model articulated one can work out an extended or superset model to accommodate awkward legacy cases.

My suggestion is that there are three classes of text files that users care about:

1) Line-oriented.  These are by far the most common and should represent the default.  For files of this class a user does not care how line endians are encoded/stored within the repository.  On commit he expects his VCS robustly to identify line boundaries and to remember their locations.  When rematerializing such files he expects line endings convenient to his chosen tool chain.  (While frequently the expectations of the user's tool chain will match those of the native OS utilies that last statement attempts to capture the need to accommodate the needs of both cygwin and virtual machines.)

2) Specified-eol.  These are less common, deriving from cases where some inflexible tool demands a certain line ending.  For files of this class the user does not care how line endians are encoded/stored within the repository.  On commit he expects his VCS robustly to identify line boundaries and to remember their locations.  When rematerializing such files he expects the line to be delimited by the endings required by the inflexible tool.

3) Hybrid.  These are files with mixed line endings that must be preserved.  (I do not have a specific example in mind but believe that this case must be handled to be fully general.)  For files of this class the user expects faithful preservation between commit and rematerialization.

Under this model a file is either verbatim or line-oriented, distinct from binary versus text.  Verbatim is more or less what bzr does today.  Line oriented means that on commit all possible line endings (lf, crlf, or cr-only) get cannonicallized.  On rematerialization cannonical line endians get converted to that file's specific line ending if it exists or else to the user's chosen default line ending.

/john


More information about the bazaar mailing list