Friday, November 28, 2008

Random Access Pseudo-Random Numbers

Since my previous post I was approached by a fellow-in-keyboards in need of a "random-access" pseudo-random number generator which could generate the i'th generated random r number directly, without computing the intermediate i-1 random numbers since seeding. Would a hash function for integers be a good RAPRNG?

No! Good hash functions avoid funnels and are thus bijective, which for 32-bit hash functions would imply only a 2^32 period. Consequently, if we draw N < 2^32 consequtive random numbers, then hashing would make r could occur once or never, but not twice or more frequently. A correct random number generation would result in an exponential distribution of the frequency of r.

We can fix this by increasing the period, which is equivalent to increasing the number of output bits of the hash, but use only 32 bits of the output. Alternatively we could hash i with two independent integer hash functions and xor the result. Either way, we deliberately create just the right amount of final funnelling.

Practical tests, such as Dieharder, are surprisingly difficult. I tried xoring results of several pairs of known integer hash functions, and eventually found a reasonable solution based my own hash_32_to_64 mentioned in my previous post.
uint32_t raprng(uint64_t i, uint64_t seed)
{
uint64_t r = (2857720171ULL * ((uint32_t) i)) ^ 0x1EF57D8A7B344E7BULL;
r ^= r >> 29;
r += r << 16;
r ^= r >> 21;
r += r >> 32;
r = (2857720171ULL * ((uint32_t) (i ^ r))) ^ (0xD9EA571C8AF880B6ULL + seed);
r ^= r >> 29;
r += r << 16;
r ^= r >> 21;
return r + (r >> 32);
}


The hypothetical period of raprng is 2^96 if i were of sufficient width. This period is barely enough to be respectable, at least compared to some serious PRNG's like the Mersenne Twister. raprng is also slower. But it is random-accessible.

Having gone through all this I need to point out that there is a less-known PRNG called the explicit inversive congruential generator (EICG), where the i'th output is defined as modular inverse of a*i+b for some a and b, where a should obviously be non-zero. The problem herein is that modular inversion is by no means cheap, and it isn't clear to me that some other very non-linear bijection - such as a very good hash function - wouldn't do.

8 Comments:

Anonymous Anonymous said...

Thanks! There are quite a few applications for an insecure RAPRNG and I have trouble finding anything online about it other than this blog post.

(For a secure RAPRNG, use a block cipher in counter mode I guess.)

For instance, you want to generate a very large, randomised map, through which someone can navigate, but don't want to pregenerate the whole thing and store it, nor even generate on the fly and store what was generated.

Couple of questions:

Despite that the index is declared 64 bits, it looks like only 32 bits of i are being used? (Always downcast to uint32_t). Is this a mistake?

Also why is it "barely respectable" for a random-access generator to have a 2^96 period? Surely it is totally irrelevant as long as it exceeds the index width a little.

-Dan

11:01 PM  
Blogger cessu said...

The downcast is dubious, I admit. However, I suggest you instead use the upper 32 bits of i in the xor with r instead of performing a 64x64 multiplication. The multiplication might be costly and doesn't result in equal impact of uppermost bits in i.

2^96 is "barely respectable" compared to some serious PRNGs such as the Mersenne twister with a 2^19937 cycle.

2:09 PM  
Blogger Unknown said...

Hi, I came across your post and I am trying to understand the sigificance of the RAPRN question. What if you took a well known PRNG and reordered the sequence by the seed values. For example take CMWC4096 and use the 4096 array as the index into the sequence. Is it possible that the reordering screws up some statistical property? Does not seem likely. Thanks, Fred

5:42 PM  
Blogger cessu said...

Fred, I'm not sure I understood your solution correctly. With "reordering the sequence", do you mean sorting an array of PRNGs? How exactly does your algorithm work when the i'th PRNG is requested?

The point of raprng is to not generate any PRNGs that are not requested for, neither to reseed in vain. Note that contrary to conventional PRNGs, raprng is entirely stateless.

11:36 PM  
Blogger Unknown said...

Hi, thanks for the reply. My interpretation of a generator like CMWC4096 is that each number in the sequence is obtained by applying a couple of arithmetic operations (involving some constants) to an array Q of 4096 numbers (called the state or the seed). WIth each call the state is updated and the next prn is returned. So if we think of i (or some simple 1-1 function of i) as the seed, then it looks like a RAPRNG, since there is no state. I guess the important question is whether the (reordered) sequence alters the statistical properties of the original sequence.

BTW, it seems that RAPRNG are easily derived from "linear congruential generators", since the recurrence x(i) = a x(i-1) + b is closed form solvable.

6:00 PM  
Blogger cessu said...

Fred, if you use i as a seed for a normal PRNG, then for each requested random number with distinct i you will have to reseed that PRNG. For CMWC 4096 each reseed requires obtaining some 4096 random numbers somehow into the array Q. Regardless of how you generate those 4096 random numbers, the few multiplications and a handful of other bit operations in my solution are surely going to be significantly faster.

9:34 PM  
Blogger Unknown said...

Ah, yes I do see your point. However, the 4096 is rather arbitrary, chosen to provided the largest period using 32-bit arithmetic. I think Marsaglia's "multiply with carry method" will work for a range of smaller values, with smaller periods of course. Here is a link to the method.
interstat.statjournals.net/YEAR/2005/articles/0510005.pdf
Cheers. Fred

10:48 PM  
Blogger cessu said...

Yes, you could reduce Q size and thereby speed up reseed. But soon the quality of reseeding becomes the crucial point. Marsaglia suggests initializing Q with the KISS algorithm, and one could replace for example the variable x with i. I know Marsaglia's reputation in PRNG design, but one would have to check whether the modified KISS would pass Diehard for successive values of i. In any case the resulting algorithm is no longer far from my code.

11:17 PM  

Post a Comment

<< Home