[MERGE] integrated EOL conversion support

Stephen J. Turnbull stephen at xemacs.org
Mon Mar 30 08:03:32 BST 2009


Ian Clatworthy writes:

 > Thanks for the feedback Jonathan. I think you're confirming what John is
 > saying: people mentally think about the working tree format as their
 > priority even if the collective "project" cares more about what's stored
 > in the VCS repository.

I still don't understand this claim.  The only project that ever cares
about the repo format per se is Bazaar.  Nobody else ever reads or
writes repo storage, except for a few thrill seekers with hex editors.
Us users would like to have repo formats just plain go away, eol
format, rich root, locks, stock and barrel, thank-you-very-much!  They
are a major source of complexity, not to mention confusion, in bazaar.

The "project" cares about two things: it wants its developers to see
no extraneous control characters or formatting bogosity in their
language translators, editors and viewers, and it doesn't want
non-text files changed at all except by an explicit act of a
developer.

So IMHO what you really want here is the following:

(1) A user-visible flag that says whether the file is text or binary,
    probably defaulting to binary.  bzr should never munge binary
    files, on pain of being dpkg --purge'd with extreme prejudice.
    (It might be reasonable to have a pre-commit check warning "if you
    commit this file, all the line endings will change!")  It should
    be moderately annoying to change this flag to text.  Ie, being
    asked to confirm "If you set this to text, on checkout and commit
    bzr will silently change the content of the affected files so that
    characters that look like line endings conform to the platform
    default or user settings.  It is safe and convenient for most text
    files.  BUT THIS CAN RESULT IN IRRETRIEVABLE DATA CORRUPTION!  Are
    you sure you want to do this?"  This setting is propagated to
    branches when pulled, pushed, or cherry-picked.

(2) If the file is text, a user-visible option that sets the
    checked-out file's line ending format.  This option may get its
    value from several places: an explicit flag to the checkout
    command (syntax will be painful since this should be a per-file
    setting, not an "everything in this command" setting); an explicit
    configuration in the workspace, an explicit user preference, or a
    default for the platform where bzr is being executed.  If a value
    is stored in the part of the branch that gets communicated to
    other branches, it should be very low priority (higher than
    platform default, but lower than any user setting).  It might be
    useful to warn if a repo default is set, and the user sets EOL
    otherwise (eg, in case of MSVC project files).

(3) An internal (not user-visible) per-file flag that determines the
    storage format of non-binary files.  (By "not user-visible" I
    suppose I mean that there is exactly one command that can change
    the internal representation of EOLs for a given file, and it does
    absolutely nothing else, and it scolds you for even dreaming of
    using it.)  This is necessary so that changing from binary to
    text, or changing the internal representation, doesn't affect
    annotate results.  You'd probably have to maintain a history of
    such property changes, but I haven't thought carefully about that.
    This is propagated from branch to branch.


 > Several people, John included from memory, have suggested using compound
 > names like native-lf, crlf-crlf, etc. where one part is the WT format and
 > another part is the repo format. If we do this and put the WT format first,
 > we get:
 > 
 > * native-lf   (instead of lf)
 > * native-crlf (instead of crlf)
 > * lf-lf       (instead of lf-always)
 > * crcf-crlf   (instead of crlf-always)
 > * exact.
 > 
 > Would those names be better and reduce the potential for confusion?

I think they present an unnecessary potential for confusion.  The user
only cares about the file he edits and compiles (or whatever), not
what's in the repository.  Admins care about what's in the repository
because of the diff/annotate problem, but only to the extent that they
don't want it to change by accident.

Isn't exact another name for 'binary'?




More information about the bazaar mailing list