RFC: handling ignored files in deleted directories.

John Arbash Meinel john at arbash-meinel.com
Wed Jul 30 15:40:20 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Westby wrote:
| On Wed, 2008-07-30 at 16:49 +1000, Robert Collins wrote:
|> In short:
|>  - do we want to fix the bug, or should we say WONTFIX?
|
| It would be nice, but I don't think it's critical.
|
|>  - is the potential for dataloss to high to just delete ignored files in
|> deleted directories?
|
| I think it is too high. While it is likely to be a rare occurance
| it would have a big impact, meaning that it is high risk.

I also think you don't use looms like Robert does :).

Any time you decide to add a new package into bzrlib (say
intertree_implementations tests), when you move between threads it
either has to create that directory or remove it. (Because it isn't in
bzr.dev, but it *is* in your branch.)

And if you have been proper and running your test suite, it will have
created .pyc files.

So every other time he switches, he gets a path conflict, which he then
has to "rm path/to/garbage; bzr resolve path/to/garbage".

It is a bit of a pain. And also lead to people who were simply
*mirroring* bzr.dev to get a path conflict because of .pyc files left in
a directory.

At a *minimum* I think our error message is difficult to really
understand why there was a conflict. And we also struggle with how to
actually resolve the conflict. (plain 'bzr resolve' can't resolve path
conflicts, so you have to go manually do it. I suppose 'bzr resolve
- --all' would sort of sledgehammer it.)


|
|>  - should the default ignore mean 'garbage file' or 'private file' ?
|
| I think it should mean 'private file.' I believe only arch has another
| class of files, and so we don't want users to have to learn about that
| before their ignored files are safe.

I played around with what our "competitors" do. Basically, neither one
versions directories, so it doesn't have the idea that you removed a
directory. So it just leaves the directory on disk with garbage in it.
Good and bad is:

1) It didn't leave a conflict that you have to resolve. This is good and
bad. Good for the common case where having an extra directory doesn't
hurt anything, though it is leaving garbage on your FS that you probably
don't even know about. (If the files are ignored, hg status won't ever
tell you about them.)

2) Which leads to the bad. Say you were referencing those .py files (now
compiled to .pyc). Python will continue to let you use the .pyc files in
lieu of the .py files, and you won't directly know why it works here but
is breaking for other people.
Or, say you have an auto-loader that traverses the filesystem, and loads
the should-be-deleted modules.


|
|>  - how should we implement
|
| I don't really mind this. Listing garbage files wouldn't be bad.
| I rarely add an ignore, and as most of the default ignore list
| will become a default garbage list that will still work.
|

I hesitate at the complexity of adding a .bzrgarbage file. I completely
understand why we might want it, and I think it has some really nice
properties. It is just one-more-thing to deal with. We *could* modify
the .bzrignore entries to clarify whether something is garbage vs
precious via an extra flag. (Obviously these don't have to be *in-tree*,
but I'm using that as reference.)

I think the default should be precious, because elmo would start
executing us one-by-one if we destroyed something important out of his
/etc. Or at least be unhappy, and we want happy sysadmins.


| I think the lost+found idea is pretty simple. It's easy to understand,
| and should be pretty easy to implement. It doesn't require learning
| a new concept. I think the name of the directory should be different,
| but that's a minor detail.
|
| Thanks,
|
| James


lost+found is, indeed, an interesting method. It probably handles the
"treat ignored as garbage, but put it in the trash rather than the
incinerator".
The problem I see is having lost+found fill up with repeated paths as
you switch around a lot. It also doesn't quite solve the Conflict issue
(should it conflict, should it just note that they have been moved to
lost+found.)

There is also a small security issue, if you think about /etc. For
example, you could have /etc/myfoo with permissions "rwx------". Then
normally nobody can see/read/write those files. If they then get moved
to /etc/lost+found/precious_passwd, they might become visible. Without a
conflict to force the admin to realize this, he may not realize the
password file just became public. (Not necessarily worth designing
around, but we should at least be aware of the use case.)

The other interesting bit is "bzr clean-tree" which could know to remove
garbage files, but not precious ones.

Of course, for *me* there are 3 levels. There is stuff like ".shelf"
which is ignored and truly precious. Then there is ".pyc" which can be
regenerated, but I generally actually want them around, so that I don't
have to recreate them. And then there is ".~1~" which I don't really
care about (at all). Some people do, *I* commit often, just so I have a
point to bzr revert to if I want. I don't worry about the "must pass
test suite on every commit" for my local branches, more of a "should
have something interesting done that I want to preserve".

I can wedge that into a 2-level system, but I could also see:

bzr clean-tree --level 2 # Clean up the "most garbage" files
versus
bzr clean-tree --level 0 # Clean up everything


Overall, lost+found is a simpler solution to the problem, which can be
layered on top of existing working trees, without causing a lot of
hardship. And doesn't introduce a new concept. I don't think it solves
it quite as elegantly, but probably sufficiently.

As for a name,
~  bzr-lost+found
~  bzr-missing-parents
~  bzr-did-you-want-this
~  lost-their-home

Most of these are a bit cutesy, I would generally just recommend that we
make it clear that bzr created this directory. I would kind of like it
to not be ignored so "bzr st" reminds you about it (though at the same
time, I don't want to ever *add* it). So maybe just a special case in
"bzr status" which checks for a lost+found directory and warns you about it.

Warning during status is preferred to a strict conflict (IMO).

Also, would you re-create the full path under lost+found? So you would have:

lost+found/bzrlib/tests/treeimplementations/foo.pyc.~1~

Good/bad is that files often don't have enough context without the
directory, but it makes it harder to dig down and see the actual files.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiQfVQACgkQJdeBCYSNAAM5hgCfXosfD4nAA0BUMvtnOoAwW/g3
crsAniIH66N0wlsArsTmtsujURIn0MDV
=6je5
-----END PGP SIGNATURE-----



More information about the bazaar mailing list