[MERGE/RFC] Revert reporting #3707

Wed Jan 10 22:32:34 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Side note, you got the bug number slightly wrong. It is 3707 not 307.
307 is a Baz 1.x bug.

Aaron Bentley wrote:
> Hi all,
> 
> A long-standing bug is that revert doesn't report on the changes it makes.
> 
> This patch adds such reporting to 'revert'.
> 
> Because revert must handle a wide variety of situations, our existing
> reporting methods are inadequate.
>  - they have no way to represent a 'kind' change
>  - they don't differentiate between 'renamed' and 'renamed and modified'.
>  - they can't indicate file versioning operations separately from file
> creation/deletion
> 
> This bundle adds reporting and introduces a new format, with three
> columns for version-change/rename, content-change, and execute-bit change.
> 
> Some examples:
> 

I don't really like having a separate field for 'added' and 'created'.
Since you can't really have one added and deleted.

I think you are doing it to distinguish from 'added' and 'not present'
(missing). And maybe you feel the need to distinguish from 'added' and
'missing' versus just plain 'missing'.

Certainly, whatever format we decide on should work for 'bzr status
- --short' as well as 'bzr revert', 'bzr update', 'bzr merge', 'bzr pull'.

Until now, we've always treated those as printing the log of the merged
revisions, rather than the delta that it generates in the working tree.
But probably the delta is more informative, or at least more what people
expect coming from svn/cvs.

> +N  dir1/
> A directory was added and created
> 
> -D  file1
> A file was removed and deleted
> 
> RM* dir1/file1 => dir2/file1
> A file was moved, modified, and had the execute bit changed
> 
>  K  name1 => name1/
> A file was changed into a directory
> 
> Version-change and rename can share a column because a file must start
> versioned and stay versioned in order for us to detect a rename.  So
> 'add', 'remove' and 'rename' are mutually exclusive, as long as we are
> detecting changes.
> 
> This implementation reports on the differences between the working tree
> and the target tree, rather than reporting on the actual changes made.
> 
> Another possible implementation would be to derive the changes from the
> TreeTransform.  This would have slightly higher fidelity:
>  - Its output could reflect conflict resolution outcomes
>  - The same output could be produced by "merge", "checkout" and other
>    TreeTransform operations.
> 
> But there are also disadvantages:
>  - It would need to be able to indicate simultaneous 'add'/'remove' and
> 'rename'.
>  - It would need to display renames due to conflict resolution in
>    a reasonable way
>  - Some effort is needed to implement _iter_changes on TreeTransform
>  - It would report the creation of backup files
> 
> The creation of backup files is especially problematic, because we don't
> technically create backups-- we rename and unversion the existing file,
> then create and version a file with the desired text.  So we would get
> 
> - R  file => file.~1~
> +N   file
> 
> When what we want is really:
> M    file
> (note that file.~1~ isn't mentioned)
> 
> So folks, what shall we do?
> 
> Aaron

Your notation here doesn't seem to follow with your earlier notation,
since your columns seem to mean something different.

But certainly we don't want to show a file as unversioned and then
versioned because we are creating a backup.

My gut says that TreeTransform is going to be too low level, and
describe things in terms of how the filesystem was modified, rather than
in logical terms of what happened to the tree.

If I understand correctly, what you are trying to propose is:

1) +-R
  inventory entry was added, removed, renamed
2) NDMK
  file itself is considered new, deleted, modified, kind change
3) *
  executable bit changed

As I mentioned, I don't really like that +N is used to indicate an added
file. It would be a lot nice to just have "A " like other systems.

While I don't think we want to reproduce all of SVN's idiosyncrasies, it
at least gives us a point of reference.

' ' no modifications
'A' Added
'C' Conflicted
'D' Deleted
'I' Ignored
'M' Modified
'R' Replaced
'X' item is unversioned, but is used by an externals definition
'?' item is not under version control
'!' item is missing (removed by non-svn command) or incomplete
'~' versioned item obstructed by some item of a different kind


We don't need "Replaced", though if you did delete + add we would need a
way to indicate that. So maybe we do want Replaced... That said, we also
need Renamed. And maybe something for "illegal" (as in something that we
cannot version, like a socket/fifo/illegal filename)

We don't need '~', since that is equivalent to 'K'.

They use '!' to indicate missing. To me, it is fine to use that whether
it is missing because it was newly added, or whether it was added a long
time ago, and just now cannot be found. I'm not super happy with '!',
but it is ok.

You also want to be able to have a different indication for Renamed than
for Renamed + Modified. I'm not positive that is strictly useful, but we
can do that.

SVN uses the second column to indicate information about the entry
properties, the third has to do with locking, the fourth indicates if it
was a copy "history scheduled with commit". Fifth is "switched" which
means very little to me. Sixth deals with repository locks, seventh
seems to always be blank, and eighth is used to compare against the
repository.

Compared to SVN, we only really need 2 columns, 1 for the entry state,
and 1 for the metadata state.

What if instead of treating "renamed" in the first column, we put it
into the second column. Then you would have:


A   added
D   deleted
M   modified
MR  renamed => and/modified
!   missing
MR* renamed => modified/and/executable
KR  renamed => kind/change/
!R  missing => and/renamed


'Kind' is a superset of Modified, and you can't have an Added + Kind
change, unless you want to treat delete file, add directory as a Kind
change instead of a new entry.

We also need to figure out how we want to handle the dichotomy between
file ids and file paths. So if I do "bzr rm x; bzr mkdir x; bzr status"
what should we get?

D   x
A   x/

or should it be

RK  x/

For 'Replaced' x with a new kind. We have a similar problem if you just
do "bzr rm x; bzr add x" though if we do:

D   x
A   x

it simplifies the case of "bzr rm x; bzr mv y x"
D   x
 R  y => x

Arguably, we should put the new filename first and do:

D   x
 R  x <= y

Though I think I prefer the new name last, since it gives a better sense
of time flowing from left to right, and the newest thing is the last.

Another representation of "bzr mv y x" would be:

RR  y => x

Which says that we replaced "X" with "Y". But then you get into more
combinatorial problems, since Y could be a directory, so you may need to
represent the kind change.

So this is what I would propose. Entries are given sorted by path, but
grouped by file-id. So if 2 file-ids between old and new had the same
path we get 2 lines of output.

column 1:
  A	Added
  D	Deleted
  M	Modified
  !	Missing
  K	Kind change (only if the file id was the same)
column 2:
  R	Renamed, also implies there will be a "old =>"
column 3:
  *	Executable bit changed

It is arguable that we could actually get rid of column 2, and just have
the presence of "old =>" indicate that the entry was renamed. It *is*
ambiguous if someone versions a file with " => " in its name (no quotes).
The only real way to avoid that is to use '//' since that is illegal in
a filename. So we could do:
M path/x // new/path/y

But '//' doesn't really give the feeling of being moved. Maybe
M path/x //> new/path/y
M path/x \\> new/path/y
M path/x \ path/y
M path/x //=> path/y

Probably '//' isn't terrible. It isn't very obvious, but you could
probably pick it up quickly. Though we really do want the tool to be
easy and understandable, at least as much as possible.

M          ,-> path/y
  path/x //

^- Would at least be unambiguous :)

We unfortunately don't have any tool with which to compare ourselves
against for renames. At least none of them I can find print out rename
information in a single line format. SVN tracks them as D+A, CVS doesn't
track them. So I decided to try my best with other RCSs. This is the
closest I can get with the other ones. In all of these cases I did my
best to imitate:

% bzr init
% echo a > a
% bzr add a
% bzr commit -m a
% bzr mv a b
% bzr commit -m rename
% bzr mv b c
% echo d >> c
% echo b >> b
% bzr add b
% bzr status
% bzr st
added:
  b
renamed:
  b => c
modified:
  c
% bzr status --short
A  b
R  b => c
M  c
% bzr log -v
- ------------------------------------------------------------
revno: 2
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: test-bzr
timestamp: Wed 2007-01-10 16:27:20 -0600
message:
  rename
renamed:
  a => b
- ------------------------------------------------------------
revno: 1
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: test-bzr
timestamp: Wed 2007-01-10 16:27:12 -0600
message:
  a
added:
  a

%%%%%%%%%%%%%% DARCS %%%%%%%%%%%%%%%%%%%%
% darcs whatsnew
{
move ./b ./c
addfile ./b
hunk ./b 1
+b
hunk ./c 2
+d
}

% darcs whatsnew -s
 ./b -> ./c
A ./b
M ./c +1


%%%%%%%%%%%%%% GIT %%%%%%%%%%%%%%%%%%%%
# The git status documentation doesn't have any indication of what flags
# are possible. And 'git log' has incomplete documentation, at least as
# of 1.4.1 which is in Edgy.
git status
#
# Updated but not checked in:
#   (will commit)
#
#       modified: b
#       new file: c

Though possibly:

% git log -C
commit 1f60121c9df49ed363ba21ce4c6cdb6a01b05983
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:50 2007 -0600

    rename

:100644 100644 7898192... 7898192... R100 a  b

commit 75aceb02a3b323de0ed5ebe89d3a788398e3ae54
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:14 2007 -0600

    a

or
% git log -p
commit 1f60121c9df49ed363ba21ce4c6cdb6a01b05983
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:50 2007 -0600

    rename

diff --git a/a b/a
deleted file mode 100644
index 7898192..0000000
- --- a/a
+++ /dev/null
@@ -1 +0,0 @@
- -a
diff --git a/b b/b
new file mode 100644
index 0000000..7898192
- --- /dev/null
+++ b/b
@@ -0,0 +1 @@
+a

commit 75aceb02a3b323de0ed5ebe89d3a788398e3ae54
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:14 2007 -0600

    a

% git log -p -C
commit 1f60121c9df49ed363ba21ce4c6cdb6a01b05983
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:50 2007 -0600

    rename

diff --git a/a b/b
similarity index 100%
rename from a
rename to b

commit 75aceb02a3b323de0ed5ebe89d3a788398e3ae54
Author: John Arbash Meinel <jameinel at liliana.(none)>
Date:   Wed Jan 10 15:59:14 2007 -0600

    a


%%%%%%%%%%%%%% HG %%%%%%%%%%%%%%%%%%%%
% hg status
A b
A c

# Yes, I did move b => c first
% hg log
changeset:   1:cc6d573297ee83a07a1fec3b2261cdceb986a7fe
tag:         tip
user:        John Arbash Meinel <john at arbash-meinel.com>
date:        Wed Jan 10 15:58:35 2007 -0600
files:       a b
description:
rename

# Shows that file a and b were involved, but not what happened to them


%%%%%%%%%%%%%% MONOTONE %%%%%%%%%%%%%%%%%%%%
% mtn status

new_manifest [ad93ff0117c539418bab4d324e5e11784046bc7d]

old_revision [ee4fb088bc89b4e142e55d11dea6ddf77d2f4bf0]
old_manifest [21d0a96b91831da8dfcce5d6ad0c5e4c40757d1b]

rename_file "b"
         to "c"

add_file "b"

patch "b"
 from []
   to [89e6c98d92887913cadf06b2adb97f26cde4849b]

patch "c"
 from [3f786850e387550fdab836ed7e6dc881de23001b]
   to [40926bdbd7eca725dc49fcea77f07f73693fa3b8]

% mtn status --brief
renamed b
     to c
added   b
patched c


%%%%%%%%%%%%%% BAZ %%%%%%%%%%%%%%%%%%%%
% baz status
* looking for foo at bar/test--project--0--patch-1 to compare with
* comparing to foo at bar/test--project--0--patch-1

A   .arch-ids/b.id
A   b
R   .arch-ids/b.id => .arch-ids/c.id
R   b => c
 M  c

So if anything, baz was the closest to what we might want to do, and it
uses a separate line for renames, and doesn't handle the ambiguity of '=>'.

Monotone is interesting in that it uses a fixed size-prefix, but makes
it more human readable. They use 3 lines to indicate a rename +
modification. 2 lines for the rename, 1 for the final. But has the
distinct advantage of being unambiguous.
darcs uses an extra line for renames, and uses '->' even though it is
ambiguous.

git status can only say "renamed" if the source file doesn't exist. And
I'm not sure what "git log" is spitting out. I assume it might be the
source and target modes, sha hash prefixes, and maybe size.
hg will give you a delete + add pair if it is able to, but it might also
give you an add + add pair.

Honestly, I think our current "bzr status --short" output works very
well. It represents renames on a line of their own, which seems to be
one of the best ways to avoid ambiguity when faced with lots of changes.
Also, it avoids having really long prefixes, preferring to just get to
the point about what is going on.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFpWmBJdeBCYSNAAMRAibGAJ9lqnrPWXGNxo1nCg92obSVlZehpgCeMglO
BYwwrjngMN/nVDxz08+PCsU=
=R0Zs
-----END PGP SIGNATURE-----