How to benchmark random number generators?

Today at work I felt like the pseudo random number generator I am using (C++; rand()) is far from producing good random numbers. I remember having read that it is better to use
(rand()/(RAND_MAX+1) )*N
than it is to use
rand()%N
in order to get a number in the range [0, N1].
Basically I wanted to know how I could compare the outcome of both methods in terms of randomness. A bunch of ideas came ot mind, like plotting a histogram, computing average and standard deviation, counting the number of identical consecutive numbers but I think all those methods falls short. Is there any proven scientific method of comparing the randomness of a sequence of numbers?


I did omit the cast to higher precision on purpose. The purpose is better readability. In practise one would probably use
(int)(((double)rand()/(
(double)
RAND_MAX+1.0) )*
(double)
N
)Better now?

@seriousJoker said:
I did omit the cast to higher precision on purpose. The purpose is better readability. In practise one would probably use
(int)(((double)rand()/(
(double)
RAND_MAX+1.0) )*
(double)
N
)Better now?
Actually, a simple decimal point after the 1 (and a cast to int at the end) would have been enough:
(int)(rand()/(
RAND_MAX+1.0)*N)

Now that we have cleared up this issue, do you have any ideas to contribute to the original problem which is benchmarking random number generators?

@seriousJoker said:
Today at work I felt like the pseudo random number generator I am using (C++;
rand()) is far from producing good random numbers.You have interesting senses. I'm able to feel, however, that an xkcd reference is coming. Then a Dilbert one.
Try [url=http://www.fourmilab.ch/random/][b]ent[/b][/url].

@Spectre said:
I'm able to feel, however, that an xkcd reference is coming.
I sense that anyone who makes a reference to that comic will have their face cut by Morb.

@seriousJoker said:
Basically I wanted to know how I could compare the outcome of both methods in terms of randomness.
rand() ===== rand() ====== rand()
You're not going to get more randomness from an algorithm if the randomness component is the same as the old one. You can scale up a photo, but you won't get any more detail.
Nonetheless, your question is still valid. If your randomess components are actually different, how does one compare randomness? I never really troubled myself to find out, because any language you'll use has randomness that will exceed your randomness needs for most nonserious practical purposes (such as selecting a bunch of random ads for a website).
If you want true chaos, get your data from real sources, such as http://www.random.org/

@Spectre said:
...
Try ent.
At last a useful answer. Thanks.
@dhromed said:
@seriousJoker said:
Basically I
wanted to know how I could compare the outcome of both methods in terms
of randomness.rand() ===== rand() ====== rand()
...
http://members.cox.net/srice1/random/crandom.html
"Do NOT use y = rand() % M; as this focuses on the lower bits of rand(). For linear
congruential random number generators, which rand() often is, the
lower bytes are much less random than the higher bytes. In fact
the lowest bit cycles between 0 and 1. Thus rand() may cycle between
even and odd (try it out).
Note rand() does not have to be a linear congruential random
number generator. It's perfectly permissible for it to be something
better which does not have this problem."I did not make this up. As I said in my first post I read this up. Using the tool "ent" however showed me that both methods basically generate about the same result in terms of entropy and serial correlation. So I recon that by now there are more sophisticated methods of rand() shipped with the latest compilers.

@seriousJoker said:
"Do NOT use y = rand() % M; as this focuses on the lower bits of rand(). For linear
congruential random number generators, which rand() often is, the
lower bytes are much less random than the higher bytes. In fact
the lowest bit cycles between 0 and 1. Thus rand() may cycle between
even and odd (try it out).
Note rand() does not have to be a linear congruential random
number generator. It's perfectly permissible for it to be something
better which does not have this problem."I did not make this up. As I said in my first post I read this up. Using the tool "ent" however showed me that both methods basically generate about the same result in terms of entropy and serial correlation. So I recon that by now there are more sophisticated methods of rand() shipped with the latest compilers.
If you're using glibc rand(), it is a linear congruential PRNG and will suffer this problem. Quite frankly, if you are concerned enough with the quality of the random output, you should not be using rand().
Generally only cryptographic applications are going to need this much randomness, which means you should be using a cryptographically secure PRNG. If you are on L00nix, then /dev/urandom should be a good source and there should be libraries for most languages. One note about using /dev/urandom: the kernel entropy tables are generated over time by sampling data from various sources. As such, if the computer is freshly started the kernel will not have much entropy. Most libs that try to read /dev/urandom just block if the requested number of random bits are not available, which will leave your program waiting until the kernel entropy tables build up.
This can be a problem with a machine that is using SSL/TLS from OpenSSL and is starting services on boot, as the OpenSSL libs will just sit and wait for the kernel to build up the entropy tables, which can take several minutes at times. What's worse is that it is damn near impossible to tell what is going on as there are no error messages  the daemon just appears to be locked up. To relieve this, there are daemons that will keep /dev/urandom at minimum amount of available random data. They do this either by sampling additional sources of pseudorandom data that the kernel does not or by sacrificing some randomness when the number of available bits gets to low by sampling more frequently than the kernel does. The daemon I'm running is just called "rngd" but I believe there are a few others out there.

@morbiuswilters said:
If you're using glibc rand(), it is a linear congruential PRNG and will suffer this problem. Quite frankly, if you are concerned enough with the quality of the random output, you should not be using rand().
I should also add: you should not be benchmarking this stuff yourself. If the randomness of the data is that important, you should be using a proven, standard PRNG. If you are running glibc and ent did not detect a lack of randomness in the low bits of rand(), then something is amiss. However, the only time randomness is critical in an application is with cryptography, in which case you should be using a standard crypto library and PRNG. If you're just trying to learn, there are plenty of good resources out there, but just keep in mind that only a fool would roll their own crypto stack or try to "tweak" their PRNG. If your PRNG is not cryptographically secure, don't use it for cryptography, period.
I don't mean to sound like a jerk, I'm just pointing out the number one rule of practical crypto, which is: Never implement your own crypto system. Always use a proven, standard library.

@morbiuswilters said:
I don't mean to sound like a jerk, I'm just pointing out the number one rule of practical crypto, which is: Never implement your own crypto system. Always use a proven, standard library.
That's cool, I'm always willing to learn.
But don't worry, the work I deal with is not security related at all and I'm far from implementing my own rand function. It's just something I got curious about for no specific reason. Now that I am curious however I'm wondering about this issue of linear congruental PRNGs. The page I posted earlier tells that these sorts of PRNGs would always return an alternating even/odd pattern. I could not find this issue (using Visual Studio 2005 btw, not glibc). So I found this page [1] which claims to show the code for one such specific PRNG. However running that code didn't show the even/odd pattern either. So I'm wondering if the even/odd pattern is indeed a problem with linear congruental prngs or if that other page was just telling bs?

@seriousJoker said:
But don't worry, the work I deal with is not security related at all and I'm far from implementing my own rand function. It's just something I got curious about for no specific reason. Now that I am curious however I'm wondering about this issue of linear congruental PRNGs. The page I posted earlier tells that these sorts of PRNGs would always return an alternating even/odd pattern. I could not find this issue (using Visual Studio 2005 btw, not glibc). So I found this page [1] which claims to show the code for one such specific PRNG. However running that code didn't show the even/odd pattern either. So I'm wondering if the even/odd pattern is indeed a problem with linear congruental prngs or if that other page was just telling bs?
Oh and this would have quite an impact even on the low security stuff I'm doing. After all it would mean that rand()%2 would always output 0,1,0,1,0,1,0,... if I got that right. And in response to an older posting in here: It doesn't take quite any special senses to feel like something is wrong with that random number sequence, right? No hard feelings about the posting though.

@morbiuswilters said:
One note about using /dev/urandom: the kernel entropy tables are generated over
time by sampling data from various sources. As such, if the computer is
freshly started the kernel will not have much entropy. Most libs that try
to read /dev/urandom just block if the requested number of random bits are not
available, which will leave your program waiting until the kernel entropy tables
build up.I object.
@random(4) said:
When the entropy pool is empty, reads from <font size="+1">[i]/dev/random[/i]</font>
will block until additional environmental noise is gathered.A read from the [i]/dev/urandom[/i] device <font size="+1">will not block waiting</font> for more entropy.</blockquote>
Emphasis mine; verified empirically.

@Spectre said:
I object.
Oh noes, I made a mistake and said /dev/urandom instead of /dev/random!
@Spectre said:
Emphasis mine; verified empirically.
You didn't really need to do that: the "u" in urandom means that it won't block. I was just lazy and distracted when I wrote that and didn't bother rereading it carefully. Anyway, thanks for pointing it out.

@seriousJoker said:
And in response to an older posting in here: It doesn't take quite any special senses to feel like something is wrong with that random number sequence, right? No hard feelings about the posting though.
I would say it is the other way around. The human brain is trained to see patterns even if there are none. So you need 'special' senses if you DON'T feel that something is wrong with random functions.Hey, my iPod prefers to play Bruce Springsteen everytime I'm in the train!

@bjolling said:
So you need 'special' senses if you DON'T feel that something is wrong with
random functions.Heh, I'm special then. I hear people complaining about RNGs all the time, and I only reply with "WTF are you talking about? So what if it produced 2 five times in a row, it's perfectly random".

@bjolling said:
Put more then 1 song/album/artist on it.Hey, my iPod prefers to play Bruce Springsteen everytime I'm in the train!
It totally depends on what you need it for, a few bits but they need to be very random, use /dev/random
Lots of bits but not too predictable, /dev/urandom.
I don't care, as long as it's fast: rand()

@morbiuswilters said:
@seriousJoker said:
"Do NOT use y = rand() % M; as this focuses on the lower bits of rand(). For linear
congruential random number generators, which rand() often is, the
lower bytes are much less random than the higher bytes. In fact
the lowest bit cycles between 0 and 1. Thus rand() may cycle between
even and odd (try it out).
Note rand() does not have to be a linear congruential random
number generator. It's perfectly permissible for it to be something
better which does not have this problem."I
did not make this up. As I said in my first post I read this up. Using
the tool "ent" however showed me that both methods basically generate
about the same result in terms of entropy and serial correlation. So I
recon that by now there are more sophisticated methods of rand()
shipped with the latest compilers.If you're using
glibc rand(), it is a linear congruential PRNG and will suffer this
problem. Quite frankly, if you are concerned enough with the
quality of the random output, you should not be using rand().I wonder how much that is still true. The man page says
NOTES
The versions of rand() and srand() in the Linux C Library use the
same random number generator as random() and srandom(), [b]so the lower
order bits should be as random as the higherorder bits[/b]. However, on
older rand() implementations, the lowerorder bits are much less ran
dom than the higherorder bits.
Has anybody recent reliable info what is correct?And regarding the rand()%M vs (int)((M*rand())/(1.0+RAND_MAX)) issue: if M is odd, the higherorder bits contribute as much to rand()%M as the lower order bits.

@Ilya Ehrenburg said:
Has anybody recent reliable info what is correct?
Cut from Glibc CVS HEAD:
[code] /* Return a random integer between 0 and RAND_MAX. */ int rand () { return (int) __random (); } long int __random () { int32_t retval; __libc_lock_lock (lock); (void) __random_r (&unsafe_state, &retval); __libc_lock_unlock (lock); return retval; } static struct random_data unsafe_state = { /* FPTR and RPTR are two pointers into the state info, a front and a rear pointer. These two pointers are always rand_sep places aparts, as they cycle through the state information. (Yes, this does mean we could get away with just one pointer, but the code for random is more efficient this way). The pointers are left positioned as they would be from the call: initstate(1, randtbl, 128); (The position of the rear pointer, rptr, is really 0 (as explained above in the initialization of randtbl) because the state table pointer is set to point to randtbl[1] (as explained below).) */ .fptr = &randtbl[SEP_3 + 1], .rptr = &randtbl[1], /* The following things are the pointer to the state information table, the type of the current generator, the degree of the current polynomial being used, and the separation between the two pointers. Note that for efficiency of random, we remember the first location of the state information, not the zeroth. Hence it is valid to access state[1], which is used to store the type of the R.N.G. Also, we remember the last location, since this is more efficient than indexing every time to find the address of the last element to see if the front and rear pointers have wrapped. */ .state = &randtbl[1], .rand_type = TYPE_3, .rand_deg = DEG_3, .rand_sep = SEP_3, .end_ptr = &randtbl[sizeof (randtbl) / sizeof (randtbl[0])] }; int __random_r (buf, result) struct random_data *buf; int32_t *result; { int32_t *state; if (buf == NULL  result == NULL) goto fail; state = buf>state; if (buf>rand_type == TYPE_0) { int32_t val = state[0]; val = ((state[0] * 1103515245) + 12345) & 0x7fffffff; state[0] = val; *result = val; } else { int32_t *fptr = buf>fptr; int32_t *rptr = buf>rptr; int32_t *end_ptr = buf>end_ptr; int32_t val; val = *fptr += *rptr; /* Chucking least random bit. */ *result = (val >> 1) & 0x7fffffff; ++fptr; if (fptr >= end_ptr) { fptr = state; ++rptr; } else { ++rptr; if (rptr >= end_ptr) rptr = state; } buf>fptr = fptr; buf>rptr = rptr; } return 0; fail: __set_errno (EINVAL); return 1; } [/code]
I say TRWTF is the use of oldstyle parameter declarations.

@Spectre said:
Heh, I'm special then. I hear people complaining about RNGs all the time, and I only reply with "WTF are you talking about? So what if it produced 2 five times in a row, it's perfectly random".
That's incorrect. If the random data was actually random, then sure. However, "random" number generators are deterministic and many that were believed to be random were later found to be predictable and weak. So producing 2 five times in a row might actually be a sign of a much bigger problem.

@morbiuswilters said:
@Spectre said:
Heh, I'm special then. I hear people complaining about RNGs all the time, and I only reply with "WTF are you talking about? So what if it produced 2 five times in a row, it's perfectly random".
That's incorrect. If the random data was actually random, then sure. However, "random" number generators are deterministic and many that were believed to be random were later found to be predictable and weak. So producing 2 five times in a row might actually be a sign of a much bigger problem.
anyone who takes only two points of data in order to make any conclusion should be led out into the street and shot. whether that conclusion happens to be correct or not.
anyway. has a distinction been made between RNGs and PRNGs here? or are we assuming then to be equivalent.

@Daid said:
@bjolling said:
I can't be the only one to remember this from a few years ago: iPod Shuffle and Randomness
Put more then 1 song/album/artist on it.Hey, my iPod prefers to play Bruce Springsteen everytime I'm in the train!
It totally depends on what you need it for, a few bits but they need to be very random, use /dev/random
Lots of bits but not too predictable, /dev/urandom.
I don't care, as long as it's fast: rand()“Our brains aren’t wired to understand randomness  there’s even a
huge industry that takes advantage of people’s inability to deal with
random distributions. It’s called gambling.”

@ajp said:
anyone who takes only two points of data in order to make any conclusion should be led out into the street and shot. whether that conclusion happens to be correct or not.
Nobody is talking about drawing a conclusion from 2 points of data. What we are talking about is whether having a "sense" something is wrong due to seemingly nonrandom distributions of numbers is rational. Spectre said it is not because the data is perfectly random. I say it can be because many PRNG algos have ended up having flaws that have given them nonrandom distributions.
@ajp said:
anyway. has a distinction been made between RNGs and PRNGs here? or are we assuming then to be equivalent.
We're only talking about PRNGs here. However, even an RNG that uses some physical source of information is not truly random and can possibly be interfered with or predicted.

@ajp said:
anyone who takes only two points of data in order to make any conclusion should be led out into the street and shot.
O RLY?

@Ilya Ehrenburg said:
The reason to use (int)((M*rand())/(1.0+RAND_MAX)) instead of (rand()%M) is simple.
Has anybody recent reliable info what is correct?And
regarding the rand()%M vs (int)((M*rand())/(1.0+RAND_MAX)) issue: if M
is odd, the higherorder bits contribute as much to rand()%M as the
lower order bits.
Immagine the silly case where M is 10. RAND_MAX is 65535. Now most if not all PRND functions have a uniform distribution, so the chance for every number is just as large. But with M = 10 on rand()%M the distribution is as follows:
0:6554
1:6554
2:6554
3:6554
4:6554
5:6554
6:6553
7:6553
8:6553
9:6553
While with (int)((M*rand())/(1.0+RAND_MAX)) it is:
0:6554
1:6554
2:6553
3:6554
4:6553
5:6554
6:6554
7:6553
8:6554
9:6553
So rand()%M favors lower numbers more then the more 'complex' (int)((M*rand())/(1.0+RAND_MAX))

well, you still have the same flaw right? now 0/1/3/5/6/8 is favored instead of 0/1/2/3/4/5...

@HypocriteWorld said:
well, you still have the same flaw right? now 0/1/3/5/6/8 is favored instead of 0/1/2/3/4/5...
The
difference is that using rand()%M, you have a systematic
overrepresentation of the numbers 0 .. RAND_MAX%M, while using
(int)(Mrand()/(1.0+RAND_MAX)) distributes the necessary bias over the
whole range. So using the latter is indeed preferable. My point,
however, was that the often given reason that "the lower order bits are
less random than the higher order bits", doesn't hold if M is odd, even
if the claim is true.[*]However, in my
libc, RAND_MAX == INT_MAX == 0x7fffffff, so the skew of rand()%M is
negligible for small enough M if you're just doing a simple simulation.

On actual benchmarking of random generators:
try playing with creating empirical distributions for different values obtained from the RNG (be it the generated value, diff from the last value, number of bit transitions from the last value, and so on) for large data sets. Then plot histogram. Maybe you can get some interesting results.

IDK about "proven scientific" anything very much, however the best practical solution to this one I have come accross is using your generator to plot "random"pixels onto bitmaps of varying dimensions. The human eye will qucikly pick up visible patterns from many "random" number generators.

@Blat said:
The human eye will qucikly pick up visible patterns from many "random" number generators.
And the human brain is fairly adept at 'recognising visible patterns' in truly random distributions.

True, but when your ink blot looks like a zebra crossing it's time to ask questions

@Blat said:
... using your generator to plot "random"pixels onto bitmaps of varying dimensions. The human eye will qucikly pick up visible patterns from many "random" number generators.
Just thought that a presentation may be useful: http://lcamtuf.coredump.cx/newtcp/

@viraptor said:
Just thought that a presentation may be useful: http://lcamtuf.coredump.cx/newtcp/
+1 wba

I was looking at the NIST website today and came across this:
http://csrc.nist.gov/publications/nistpubs/80022rev1/SP80022rev1.pdf
Reminded me of this thread

@savar said:
I was looking at the NIST website today and came across this:
http://csrc.nist.gov/publications/nistpubs/80022rev1/SP80022rev1.pdf
Reminded me of this thread
GOD why doesn't the forum format plaintext posts?!!
I was looking at the NIST website today and came across this:
http://csrc.nist.gov/publications/nistpubs/80022rev1/SP80022rev1.pdf
Reminded me of this thread

@PJH said:
@Blat said:
The human eye will qucikly pick up visible patterns from many "random" number generators.
And the human brain is fairly adept at 'recognising visible patterns' in truly random distributions.Well we aren't talking about Rorschach here. But plotting your random numbers is a good way to see if your random routine is random enough. What you are looking for is a fairly even distribution of the dots over your sample space. You aren't really looking for patterns, but you shouldn't be seeing any straight lines.
Yes there are a LOT of times when you care that you are getting random enough results, but aren't using a cryptosecure random number.

@seriousJoker said:
Today at work I felt like the pseudo random number generator I am using (C++; rand()) is far from producing good random numbers. I remember having read that it is better to use
(rand()/(RAND_MAX+1) )*N
than it is to use
rand()%N
in order to get a number in the range [0, N1].
To answer the original question, you should never use rand()%N, no matter how random the lower bits are, unless RAND_MAX is evenly divisible by N.
Lets assume that RAND_MAX = 8 so rand() generates 0  7.
If N is 5 then you get 0,1,2,3,4,0,1,2 as possible values. Yuo should be able to spot the problem.

@chrismcb said:
@seriousJoker said:
I remember having read that it is better to use
(rand()/(RAND_MAX+1) )*N
than it is to use
rand()%N
in order to get a number in the range [0, N1].
To answer the original question, you should never use rand()%N, no matter how random the lower bits are, unless RAND_MAX is evenly divisible by N.
Lets assume that RAND_MAX = 8 so rand() generates 0  7.
If N is 5 then you get 0,1,2,3,4,0,1,2 as possible values. Yuo should be able to spot the problem.
But if you use (rand()/(RAND_MAX+1) )*N, you get 0,0,1,1,2,3,3,4 as possible values, which, although slightly better, still has problems. If RAND_MAX is not divisible by N, you have to do smething more complex (e.g. throw away some values, and regenerate another number).

@gremlin said:
But if you use (rand()/(RAND_MAX+1) )*N, you get 0,0,1,1,2,3,3,4 as possible values, which, although slightly better, still has problems. If RAND_MAX is not divisible by N, you have to do smething more complex (e.g. throw away some values, and regenerate another number).
Actually, the problem is in which bits you are using. With a % operation, you are always using only the lowestorder bits. With (rand()/(RAND_MAX+1) )*N, you're using all of them. So the overall entropy of the PRNG is in your calculation, no matter to which bits that entropy may have been localised. Many PRNGs have localised entropy at certain points in the cycle, and when you're only using the lower order bits, you may get no entropy at all for a while.
Imagine a PRNG which textually reorders digits. If it doesn't happen to reorder the final digit in a given call, any N <= 10 will generate the same number as the previous call.

@CDarklock said:
Actually, the problem is in which bits you are using. With a % operation, you are always using only the lowestorder bits.
Not always, only if N is a divisor of (RAND_MAX+1). Otherwise you are using all the bits. But then you are biased towards certain numbers (the lower ones).
@CDarklock said:
With (rand()/(RAND_MAX+1) )*N, you're using all of them.
Again, not always. If N is a divisor of (RAND_MAX+1), then you are only using the highestorder bits. And if not, you are still biased towards certain numbers, but they are more evenly spread throughout the range.
@CDarklock said:
So the overall entropy of the PRNG is in your calculation, no matter to which bits that entropy may have been localised. Many PRNGs have localised entropy at certain points in the cycle, and when you're only using the lower order bits, you may get no entropy at all for a while.
No, neither method guarantees you use all of the bits. But many PRNGs have less entropy in the lowerorder bits, so restricting yourself the the higherorder bits is better than the lowerorder bits.
@CDarklock said:
Imagine a PRNG which textually reorders digits. If it doesn't happen to reorder the final digit in a given call, any N <= 10 will generate the same number as the previous call.
Only if N==10, not for all N<=10. If N=3 it would use all the digits. And if it doesn't happen to reorder the first digit in a given call, then the other method would also generate the same number twice for N==10, but that's not necessarily a problem, a good PRNG should generate the same number twice sometimes, otherwise it's not very random.