this week in ~canonical-bazaar

Tue Oct 25 12:41:27 UTC 2011

I must say I'm pleased to see practitioners actually using statistics
in their work.

But you might want to get a pro to help you with it. :-)

John Arbash Meinel writes:

 > As a very quick example, you would like the 95% confidence interval of
 > the mean to not include the other mean.

As stated, this is incorrect.  The 95% confidence interval around one
mean can only validly be compared to constant values like the mean of
a population (which is unknown, and therefore cannot be tested), not
to realizations of random variates like the mean of another sample.

The correct procedure is to use the 95% confidence interval around
zero based on the sum of the standard errors, and test whether the
algebraic difference of the means falls in that confidence interval.

 > When I tried it, with artificial numbers from
 > random.normalvariate(10+i*2, 4), I found some interesting results.
 > With the *same* underlying random functions:

>From the numbers you've posted, I assume that i = 0 for command A, and
i = 1 for command B?

 > The variation in p is pretty surprising to me.

It doesn't surprise me.  With a standard deviation of 4 in the
population and 16 iid observations in each sample, the standard error
of the mean of each sample is ~1.  With a difference of 2 in the
population means, the mean of command A should be *greater* than the
mean of command B a significant fraction of the time (surely > 3%).[1]
So A and B should be statistically close, with a p value near 1,
fairly frequently.[2]

 > From 0.018 which is almost VERY SIGNIFICANT down to 0.967
 > very-very-not-signficant. This is using nrounds = 16.
 > 
 > Even with 50 rounds, I could get p=0.88, though most of the time it
 > was between 0.01 and 0.10.

With 50 rounds, the standard error of the mean of each sample is ~4/7,
and the likelihood that the mean of command A is greater than the mean
of command B goes down to about 0.25%.

 > Anyway, just a thought, using Power to indicate the confidence in your
 > confidence can be useful.

Yeah, it can, but power is controversial even among statisticians,
because you need to make strong assumptions that you have no way of
justifying to compute it at all in most cases.

If you want to compute power, probably the most plausible strategy is
to use the observed means.  But then interpretation is slippery:

    Statistician: "Although we could not reject the hypothesis that
    the means are the same, the power of the test is small because
    even if the means are different, the difference is very small."

    User: "If the difference is that small, I don't care, anyway."

Footnotes: 
[1]  For normal random variables, A > 12 + 1 about 1/6 of the time,
and B < 14 - 1 about 1/6 of the time.  The probably of both events
happening is (1/6)*(1/6) = 1/36 ~ 3%.  In addition, there are many
other ways for A > B to happen, but I don't feel like doing the double
integration right now so can't estimate how much weight to put on them.

[2]  This requires a double integration too, so I'm not going to even
try to estimate it, except to repeat "I'm not surprised".<wink>