<div dir="ltr"> <div class="gmail_quote">On Thu, Aug 28, 2008 at 5:47 PM, John Arbash Meinel <<a href="mailto:john@arbash-meinel.com">john@arbash-meinel.com</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Geoff Bache wrote: ... <div class="Ih2E3d"> > > But you need a lot less tests this way. Because you can have non-coders > involved, and because you necessarily focus on possible usage of the system > and interesting usage of the system, you don't end up writing thousands of > tests for things that will never happen in practice (or that nobody will > care about if they do) </div>If experience has taught me anything. There is very little that "will never happen in practice". Users are *extremely* good at finding all of the edge cases in your code. <div class="Ih2E3d"></div></blockquote><div class="Ih2E3d"> Well yes, but my point is that there are many "whitebox things" you can do with any given collection of code that are totally impossible via the external interface so that no user can do it however hard they try. Then there is another class of operations that can be done but it's just taken as read that they aren't interesting. I bet I can trigger a few internal errors in bzr by deleting half of the files in my repository and it probably won't give me nice error messages (actually I triggered a nice python stack by turning some of it into symbolic links while writing these tests...) but I doubt you're too worried about that. At the level of classes and methods this kind of thing can seem important to test, but in practice they just bloat the test suite most of the time.    > > Leveraging code structure is good if you want to provide well isolated units > that can seamlessly be used in a different context. In practice though most > code will only ever be used for one purpose as part of one system. It has > the big disadvantage that it "holds the code hostage" after a while: > nobody's going to redesign the code, however necessary it gets, if doing so > means rewriting 300 tests. </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">That is also usually a sign of poorly written tests. When they test "more than they should". </blockquote><div> True, but in my experience it seems to be very easy to write whitebox tests "poorly" in this respect, and very difficult to do it well. If you've managed to write 15000 whitebox tests and avoid this problem then all power to you, but my experience and word from those I know suggests this is a fairly common problem. > (And blackbox tests tend to fall under this because they test > the whole stack at once.) But if they're properly blackbox they don't depend on the internal structure, so changes in design can't cause them to need rewriting surely? The basic issue is how much stuff they depend on that is extraneous to the purpose of the test, not how much stuff in total they depend on. </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"> > If in-process speed is seen as all-important, yes. I personally think that > should be well down the list of priorities for an automated test suite, > especially if it's already well beyond being able to be run interactively. > And I think parallelising is easier than you might think and a better > generic solution to slow tests : hardware is cheap these days (though I > admit this equation is more complicated if all the developers are working at > home). </div>And yes, we are. Not to mention... this is open source, and we want to encourage people who submit patches to run the test suite. I'm in the US, Robert, Martin, Andrew, Ian are in Australia, Vincent is in France, Aaron is in Canada, HQ is in London, etc. The PQM (automated system that runs the test suite and if it passes commits to mainline.) is in London, and *is* in a data center with a lot of machines (somewhere around 100+). However, the machines are dedicated to specific uses, and the admins are fairly strict. <div class="Ih2E3d"></div></blockquote><div class="Ih2E3d"> OK. That's still not necessarily fatal to parallelism (you don't need a data centre with 100 machines, I have 3 machines at home and making use of those would near enough triple the speed of a large enough test suite, which is still pretty useful) But I take the point. > >   - IMO user acceptance tests are not equivalent to black box tests - >>   they are testing that the users goals really are satisfied; so they >>   may sometimes be best represented by driving the UI, but may also >>   be othertimes best represented by driving the API. (For starters, >>   it depends on the 'user' - the folk writing qbzr and bzr-gtk need >>   API level tests, folk driving the CLI probably want UI tests, except >>   when the thing they are asking about really isn't a UI problem. > > > Yes, this is true. And IMO user acceptance tests are the most important and > most effective form of testing (though not usually the fastest..) > </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">They are *a* good part of testing. However, we currently support 9 repository formats which can be accessed over 5 transports (bzr+ssh, sftp, http, ftp, local). And you don't really want to run a set of heavyweight tests against all formats for all possible permutations of commands. We also support 4+ branch formats, and 3 working tree formats. Which (in general) can be mixed and matched. We certainly have default configurations, but when you do "bzr status" it has to extract data from the repository, and compare it to the working tree. So you should make sure that sort of thing works. We do that internally with white-box interface testing. So all repository implementations have 400+ tests run against them. However, this prevents us from having to run a single test 3*4*9*5=540 times. It is also really helpful when *implementing* a new Repository format, because when the new 400+ tests pass, you can be quite sure that your repository format conforms. </blockquote><div> Sure. Doubtless a good idea.  </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div class="Ih2E3d">>> Two related projects to bzr that may well love testtext are the bzr-gtk >> and qbzr projects, which are writing GUI's - and unlike bzr's core, >> don't have a really slick way to write comprehensive tests for gui >> interacting code (or didn't last time I looked - I may be out of date). >> > > OK. If I feel inspired I might try and write a little test suite for bzr-gtk > : it's always easier to convince people who don't already have lots of tests > :) </div>Though you may have to convince them to... run the test suite. I wrote quite a few tests for bzr-gtk when I did some work on it. It turns out that the gtk gui code is fairly easy to whitebox test. Because you don't actually have to run the event loop to inject changes and have it propagate values. (You can simulate button presses, etc, and then check the values in the text areas.) However, the test suite has broken a few times, because people didn't ever actually run it before bringing in new patches. Usually trivial things, but just a sign that the devs in the project don't commonly run the test suite. </blockquote><div> OK. I'll see if I feel like it's worth the effort. Thanks for an interesting discussion anyway, it's given me some insights into the bazaar project and the open source world in general. Regards, Geoff   </div></div></div>