[MERGE] Fix for bug 183831

Fri Apr 17 23:28:25 BST 2009

>> Robert said something interesting earlier on this thread: "I'm pro
>> this change because it fixes the bug and doesn't break other tests".
>> Isn't that what Test-driven Development is about? We don't try and get
>> every pathological case right at once: we proceed in small steps and
>> fix one issue at a time.
>
> TDD isn't always a complete solution.  TDD works when you're reasonably
> sure what you need to test.  When you're changing the kind of operation
> a function does, it's quite easy to expose gaps in coverage.
>
> Say I patched osutils.pathjoin to replace all instances of 'ı' with 'y'.
>  I wouldn't be surprised if all our tests passed.  Sure, we test that
> our code works with unicode, but we don't do exhaustive tests with every
> character.

Well, yes, TDD isn't a complete solution in that you also need to
engage your brain. But I disagree that you need a long period of
worrying about side-effects after the tests go through. To me, TDD
means that if we're afraid we've broken something, we write more tests
until we aren't afraid any more. We don't engage in speculation about
what might possibly be broken.

>
>> There are 17000 tests: if they all work then surely that's a pretty
>> good indication we can proceed.
>
> I don't think it is, because I don't think the behavior of symlinks with
> relpath is well tested.  Because symlinks are not supported on all
> platforms, we rarely use them when we're not specifically testing
> symlinks.  And until now, there's been no reason to test relpath
> behaviour with symlinks.
>

Surely what's interesting is whether there is decent blackbox test
coverage of general symbolic link usage. Relpath behaviour with
symlinks is not an end in itself: we need to know we haven't broken
other actual usage.

If there isn't good test coverage of this functionality, then sure,
let's get writing some tests for it. That'd be a lot more useful than
rejecting all changes to it because it might break. It's true that one
of my assumptions was that the existing 17000 tests are appropriate
and have good coverage, which I really know nothing about. It seems a
very unfortunate situation to have that amount of test code to
maintain and still not feel able to rely on the tests.

>> How do we know when we're done
>> otherwise? We could sit in our rooms and dream up undesirable side
>> effects until next year, but we still won't think of everything that
>> will happen, and we'll probably think of plenty of things that won't.
>
> Whereas, if we make a change whose scope is limited to the problem we
> are trying to solve, it is much easier to predict the effects.

Sure, if you can sensibly do that, if the system is simple enough and
if you know it well enough that analysing the scope and effects is
straightforward.  That's 3 pretty big ifs though.

>
> I have tried to point out where I think this should be fixed:
> tree_files_for_add.

When I look at tree_files_for_add, I see a function with 6 levels of
indentation, 6 or 7 branch points, quite a few of which deal with
files in some way, and no documentation beyond the fact that it's a
"custom implementation", which seems to be a nice way of saying that
it's been copied and pasted from internal_tree_files. Even the main
branches of these functions contain a fair amount of identical code.
And "log" has another implementation again, which is apparently not
based on this code...

Copy-paste code scares me, because it's hard to get an overview of how
many of the copies I should be changing and exactly how and why they
differ. I scanned this earlier but decided it was no place for a
newbie, for the above reasons. I've now spent a fair while browsing it
again, as it seemed like this was the only way to actually move this
forward.

Fundamentally, I am not aware of any reason why "add" or "log" should
handle symbolic links in a different way to any other command. It
seems the "general" code (in internal_tree_files) always follows
symbolic links currently, irrespective of whether it's the "terminal
path element" in your terminology or a working tree is present, which
is wrong in a lot of cases. What "log" does I can't really fathom,
but it seems to work best of the three implementations from
experimenting (I haven't succeeded in breaking it, whereas breaking
the other two is easy). Maybe finding its symlink code would be a good
idea, but I've failed to do that unfortunately. Can you help?

It seems wrong to me to just exacerbate this situation by inventing
some more complicated symlink rules of my own that apply only to
"add". That can be done if necessary, of course, but I don't really
see why it's less of a hack than my current implementation. It could
be the case that my current change will only affect "add" anyway as
things stand (the only concrete problem you found was with "add", so
could equally apply to a change here), but I cannot prove this and am
not sure it's worth the trouble...

>> and then you tell me it's me that wants to fix everything at once!
>
> Yes.  Because you changing osutils.relpath instead of changing the add
> operation, I thought you were trying to fix all other operations that
> can have problems when Bazaar is given a working tree path containing
> symbolic links.

The difference is that you have a code-centric view, whereas I have a
behaviour-centric view. For me, the code is only a means to an end,
not an end in itself, and in a system as complex as this one I don't
think it's possible to analyse risk impacts of changes just by talking
abstraction levels.

To put it another way, just because code is called in cases other than
the one we're trying to fix doesn't make it "overkill" to change it.
If it did, we'd never share any code at all, but create new copies
each time to be sure we were making changes with limited scope.

/Geoff