Random numbers

Wed Nov 3 15:07:02 UTC 2010

C de-Avillez wrote:
> It is common to see that on very busy HTTPS sites without a
> good PRNG.
>   

Aren't the ones running Linux going to use /dev/urandom?  If so--modulo 
any bugs in /dev/urandom that I don't know about--it is a very good 
random source.

> Verifying true randomess on PRNGs is considered a hard problem.
>   

Verifying any purported randomness is difficult.  Another definition of 
randomness is "unpredictability".  It depends on the skill and insight 
of whomever is doing the predicting, proving something can't be 
predicted is trying to prove a negative.  Logical difficulties lie there.

20-some years ago there was a decent distinction between pseudo-random 
number generators and "true" random number generators, but things have 
evolved.

"True" random number generators had flaws.  Consider a random number 
generator that actually flipped a coin.  Pretty solid, real randomness!  
Except any real coin probably has a little bias, given enough output one 
can start to make predictions that will be ~slightly~ better than 50-50, 
its output should be cleaned up, run through an algorithm, probably 
"whitened" a little.

Alternately, because PRNGs are all predicated on some starting secret 
state and the longer they are run the more information about that 
starting state will necessarily leak out, it was obvious that we can 
make them better by mixing up that state in an ongoing-basis by looking 
for some ongoing source of entropy to mix in.  Even if the entropy 
source is not perfectly unpredictable, any unpredictability can be made 
to help.

So real random number generators get post-processing that looks like 
PRNGs and PRNGs get their internal state updated with new random 
data...and they start to converge.

The Linux random device is such a hybrid design.  It looks for sources 
of entropy to stir its entropy pool, and it uses cryptographic-quality 
SHA-1 hashing as a PRNG to turn that pool state into a stream of random 
bits.

A modern server that appears to not have any good entropy sources 
actually does: It has a fast system clock, the clock is generated with 
analog PLL circuitry from a lower frequency clock, and when interrupts 
happen the exact number of system clock ticks at that moment is used as 
entropy source to mix the pool.  Sure, from the outside, one can measure 
how long the computer has been running and one can note when the 
ethernet packet comes in, but it is not easy for an outside observer to 
know the least significant bit or two of that high speed counter at the 
instant the CPU looks at the counter as part of servicing that 
interrupt.  And unless you can know every bit's value in the counter at 
the moment the CPU reads it, at least one bit of entropy will slip past.

That greater-than-GHz clock, the counter that increments from that 
clock, and the CPU's interrupt processing, are all inside one chip, 
remote from external prying eyes.  If every ethernet packet supplies at 
least one bit of new entropy, and if SHA-1 is mostly good at hiding 
input data and mixing up its output (and the system is generally 
secure)...then we get dang good random numbers from the Linux kernel. 

If https is using /dev/urandom, I don't see what is lacking there.

Note that not all uses of random data want cryptographic-quality random 
data.  Traditional-style PRNGs are great for Monte Carlo simulations.  
Something like Mersenne Twister has an enormous number of internal 
states (and so can run a very long time before repeating) yet by 
resetting the internal state to a known value one can get a repeatable 
sequence.  (Change your algorithm and do another test run...was the 
output different because of different random data or because of your 
code changes?  By being able to have the same random data for each run, 
you can control what changed.)

Mersenne Twister does not go to great effort to hide its internal state, 
and so is not suited to cryptographic uses where humans might be working 
hard to deduce the internal state and win the lottery, but, say you are 
doing a biological simulation, it is reasonable to assume that some 
virus simulation isn't going to deduce the internal state.

Yes, proving randomness is hard, but if a random number generator is (a) 
designed with thought, (b) carefully constructed, and (c) built out of 
good parts (i.e., SHA-1), the result can be very good.  The Linux random 
number generator makes a good stab at all three of those points.

-kb