my strategy on implementing line-endings (eol) support

Alexander Belchenko bialix at ukr.net
Mon Apr 14 13:50:31 BST 2008


Robert Collins пишет:
> I realise I'm replying quite late to this thread. I think its great that
> you are working on eol support. The squid windows developers use VS and
> are crying out for this.

Yes, your comments is a bit late, but thank you anyway.

> In terms of code in bzrlib, we did discuss eol support at the London
> sprint (I really wish you had been able to attend).
> 
> http://bazaar-vcs.org/SprintLondonMarch08/Brainstorms#head-f6668b591465639502e563d4321d1a28fd7f0b10 is the notes from the sprint.

I read those notes several times, and in some aspects my work followed them, but in some
is not. I think some points unclear or probably overcomplicated. At least I can't follow them
directly all the time.

Just to be sure we are talking about the same things here is my comments on that notes.

 >== Line Endings ==
 >
 > * trivial changes resulting in entire file diffs. Hard to read, blows up repository size, invalidates sha1 caching in 
 >dirstate.

I suppose you were talking about mangling of line-endings in working tree. Completely agree.
This is main problem I wanna to solve.

 > * file writes to go through tree API, as reads do now.

Currently (in bzr.dev) all writes goes through TreeTransform.create_file API. It's almost the only one
function that write new content of file to disk. It's good enough for my purposes
so currently I decided to not change this design, and provide support for eol in create_file
and around it.

 >   * infrastructure to transform content, write native endings to working tree, convert to canonical text file 
representation on commit

I'm not sure I understand this part: "infrastructure to transform content".
Other parts is already implemented: write goes via small changes in TreeTransform.create_file,
read normalized content from disk is in WorkingTree.get_file/get_file_text.

 >     * could also do things like keyword expansion

Could, but I don't want to implement it right now. Keywords expansion is often asked
but I personally don't like it. I still had bad feelings of it back to time
when I used CVS.

 >     * some way to control this behaviour, for example ignore style regexes to determine what files to do it for

I don't understand this sentence. In my work eol support is controlled by 'eol' versioned property.

 >   * Test that every command goes through the tree api by having a tree implementation that horribly mangles everything.

I have dedicated tests for eol support, including mangling of eol,
so I will not planning to write other tests how your notes describe it.

 >   * native in working tree not always the right thing to do

Agree, that's why I have choice: exact/native/LF/CRLF/CR.


> The key elements (not all are in the notes :[) are:
>  - in memory files will always have canonical line endings. The line
> ending conversion will be hidden behind the working tree barrier.

Yes, this is how my code is working now.

>  - We couldn't decide on how to sort out the numerous issues related to
> historical control of eol conversion, so we proposed that in the first
> implementation there would be nothing in the historical database to
> control line endings.

My simple versioned properties implementation deal with this very simple:
if revision tree does not have file .bzrprop inside then there is no
versioned properties, and default value for 'eol' will be used.

> We'd just have a config setting to set eol for a
> given path/file type/regex. Once we have solid user feedback we can
> extend this to historical operations. Doing this avoids having to deal
> with merging different eol options for a given file, for instance.

I disagree on this point, mostly because versioned properties support
is a separate thing. Merging of versioned properties is separate thing too.

>  - the sha1 stored in the dirstate cache should be that of the
> canonicalised file content

Yes, my code use different sha_file_by_name routines for the case 'exact' eols
and non-exact. In non-exact case file will be read from disk in 'rU' mode.

>  - users that share working tree across operating systems *and* use eol
> feature, get to keep both pieces.

I don't understand this sentence.

>  - we need a new workingtree format to enable this feature

Sorry Robert, but I did all my work with current WT formats and I don't see any
special reason why I should invent new WT format.
And what is more, I don't want and don't planning to write new WT format
and all supporting code with upgrade path, many new tests and other stuff.

My approach is backward compatible with any old clients. Old clients just
can't do any eol conversion or provide eol support "auto-magically"
of course. But I don't see critical problems there. Only if people
will commit their changes blindly without inspecting them first they
potentially could have problems because they will not catch the fact
that file will be committed with wrong line-endings.
Otherwise any user will easily catch the fact that there is something wrong,
because according to your brainstorm notes:
"...trivial changes resulting in entire file diffs..."

> I suspect you're doing quite a lot more than this, and would like to
> suggest that getting the very core bit - the abstraction for working
> tree and some methods to call to determine EOL setting inside the tree
> for each file into bzr itself would help create a smaller more
> reviewable patch.

May be. My code for eol support is based on versioned properties, but
actually is not tightly coupled to it. I just need the way to get 'eol'
property for the pair (file_id, filename) or even just for filename,
and core eol support should work. If you wanna to have just bare infrastructure
for eol support it's possible to replace code like this

eol = self.properties.get_by_id_name(file_id, abspath)

with

eol = None

And leave other code as is. But this code can't do anything useful.
Why this needed if we can do better?

Of course I can provide smaller patch, but because I did all my work
on top of versioned properties, I'd like to get some decision about
versioned properties first.



More information about the bazaar mailing list