[MERGE] timing sensitive test can cause spurious failures

Fri Sep 1 10:13:06 BST 2006

On 31 Aug 2006, Andrew Bennetts <andrew at canonical.com> wrote:
> John Arbash Meinel wrote:
> > Andrew Bennetts wrote:
> [...]
> > > Timing-sensitive tests suck.  They will always break for someone somewhere,
> > > because there's always a system that's faster or slower than you ever expected.
> > > 
> > > Also, it's probably a good habit to think of time.sleep as a low-resolution
> > > hint, rather than something to be trusted.

Yes.

> > Well the test under consideration should actually be taking ~1ms. It is
> > something *really* simple. We've already bumped it from being <10ms to
> > being <100ms. We discussed it, and we don't really want to go above
> > that. Robert explicitly didn't want it to be too loose, and miss a real bug.

One can find bugs by wondering "hm, why is that trivial test taking 10
seconds?"  But that's a bit separate from the assertions of the test
itself; as Andrew says it's entirely reasonable for tests to proceed
slowly because of environmental factors.

> > You machine is taking 100 times too long to respond to a simple
> > time.sleep() call. It might be better to use a loop, or something like
> > that, which should take a little while, but maybe be less sensitive than
> > time.sleep().
> 
> Sorry, I just don't think you can assume that a reliable amount of time will
> pass between one python bytecode and the next, even if it's invoking time.sleep.

Agree.

> I wasn't paying close attention, but I suspect what happened is something like a
> sudden burst of IO activity happened elsewhere on my laptop, causing other
> processes to not be scheduled for >10ms.  A sudden swap storm, spam filtering,
> cron job, annoying flash animation on a web browser, running two test suites
> simultaneously, all sorts of things, could cause this.  In the extreme, I might
> suspend my laptop and wake it up again an hour later, or NTP might have adjusted
> my system clock...
> 
> What are we trying to test here?  It surely isn't that the nanosleep syscall
> will return just after the requested 7ms within a miniscule tolerance 

Good point.  I'd state it even more strongly: this test has nothing to
do with timing.  It's about the way things are written into the output
file.  Really we are just being a bit lazy by actually running a test,
rather than testing the output formatter specifically.  

Andrew's initial fix to be more lax and just assert there are integers
there would be OK with me.  

Alternatively we could feed it a canned result with containing specific
fixed values, and check that they are written properly.  If John or
Robert -1 your patch then that would be better.

Anyhow, thanks for bringing it up.

> Finally, zero tolerance to intermittently failing tests is a healthy attitude.
> One or two of them that fail rarely, and you barely notice... then one day
> there's twenty different ones that might fail with varying frequency, and half
> your test runs on certain systems fail for no good reason, and you tear your
> hair out trying to fix it all.  This happened to Twisted (in part because of
> tests using time.time/time.sleep), and we're *still* trying to undo all the
> damage years later.

I agree.  Also you can learn interesting things as a group by talking
about how to avoid having any intermittent failures.

-- 
Martin