How to generate a random number



  • I present to you, with all comments intact, a routine for generating random integers in the range [1..iUpper)

    static double S_ran( void )
    {
    	static long int iy = 100001L;
    
    	iy *= 125;
    	iy -= (iy/2796203L) * 2796203L;
    	return(double) iy/2796203.0;
    }
    
    int RandomBetweenOneAnd( int iUpper )
    {
    	int iVal;
    	double d;
    	
    	d = S_ran();
    	if (  0.0 > d )
    		d = d * -1.0;
    		
    	if ( 0.0 == d )
    		d = .123;
    		
    	while ( d < 0.01 )
    		d = d * 10.0;
    		
    	d = ((double)iUpper + 0.99) * d;
    	
    	iVal = (int) d;
    	
    	while ( iVal > iUpper )
    		iVal = iVal / 10;
    	
    	if ( iVal < 1 )
    		iVal = 1;
    		
    	return iVal;
    }
    

    Now, consider that function as applied in this code fragment:

    if(1 == RandomBetweenOneAnd(2))


  • Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.



  • my favorite part is how it casts it to an int, and then divides it by ten, and then expects it to still return an int.



  • @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.

    You've probably missed the static variable "iy" in the function S_ran(). The value of that variable gets changed with each successive call to the function.



  • @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.
    Calling rand() without seeding the random number generator will result in completely static results too.

    @Carnildo said:

    You've probably missed the static variable "iy" in the function S_ran(). The value of that variable gets changed with each successive call to the function.

    You'll get the same results the next time you run the program, though.
     



  • @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.

    static is exactly what you are missing. 



  • Related.



  • @misguided said:

    my favorite part is how it casts it to an int, and then divides it by ten, and then expects it to still return an int.

    Is there something funny about C's integer division operator? Any int divided by 10 is an int. If you want a floating point result, you must divide by 10.0 instead. 



  • @misguided said:

    my favorite part is how it casts it to an int, and then divides it by ten, and then expects it to still return an int.

    [0..9] / 10 = 0

    0 = int 

     

    Just to be reeaaly sure, I tested it in C#. 

    It's sane. 



  • @Carnildo said:

    @freelancer said:
    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.
    You've probably missed the static variable "iy" in the function S_ran(). The value of that variable gets changed with each successive call to the function.

    @asuffield said:

    @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.

    static is exactly what you are missing. 

    Now I feel stupid. Or tired. Probably a little of both. 

     

     

    @Jivlain said:

    @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.
    Calling rand() without seeding the random number generator will result in completely static results too.

    Except you can't seed this one (right? or am I [i]that[/i] tired?)

     



  • @freelancer said:

    @Jivlain said:


    @freelancer said:
    Am I missing
    something? Those arithmetic operations seems complete static. No random
    at all. Please tell me I'm missing something.

    Calling rand() without seeding the random number generator will result in completely static results too.

    Except you can't seed this one (right? or am I [i]that[/i] tired?)

    It seeds itself.  With the exact same value, every time you launch the application.  It's the equivalent of calling srand(100001); instead of srand(time()); 

    In some cases, this isn't a huge problem.  In fact, some security architectures depend on what's effectively a pair of RNGs with a shared seed -- like two-factor authentication keyfobs.  Of course, you wouldn't want to hard code the seed in these cases, as they should be set on a per-installation basis, not per-application. 

     

    In most cases, this isn't the behavior you want from your RNG, and may actually be a security issue -- e.g., a low volume gambling site, or cryptographic software using a fixed seed would be very, very bad.



  • @freelancer said:

    Am I missing something? Those arithmetic operations seems complete static. No random at all. Please tell me I'm missing something.

    It's random in the same way that doing srand(<constant value>); rand(); is random - you get a "random" sequence, but it's the same each time the program runs.

    Interesting fact: RandomBetweenOneAnd(2) does in fact return different values each time. I made a quick program that called it 16 times, and got this sequence: 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1. So it's not as bad as the OP made it look like (that was my impression anyway: the last sentence had me suspecting it'd be returning 1 or 2 every time, but it turns out it doesn't.)

    It seems the only thing WTF-y about this is the odd way to do modulus (I guess return (iy %= 2796203L)/2796203.0; was too easy), and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.



  • (damn edit timeout expired)

    Damn I took too long posting that :/ .



  • @aquanight said:

    Interesting fact: RandomBetweenOneAnd(2) does in fact return different values each time. I made a quick program that called it 16 times, and got this sequence: 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1. So it's not as bad as the OP made it look like (that was my impression anyway: the last sentence had me suspecting it'd be returning 1 or 2 every time, but it turns out it doesn't.)

    I haven't tested it, but it should show a bias towards returning "1".

    It seems the only thing WTF-y about this is the odd way to do modulus (I guess return (iy %= 2796203L)/2796203.0; was too easy)

    The result of "iy = iy %2796203" is compiler- and CPU- dependant for negative values of "iy"; the result of "iy -= (iy/2796203L) * 2796203L" isn't.

    and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.

    C++, and yes, that's something of a WTF: even the worst implementation of rand() has generally better characteristics than S_ran() does. What you're missing is the effects of the if()s and while()s in RandomBetweenOneAnd(): those bias the numbers returned towards the ends of the range, and prevent certain numbers from being returned at all.



  • @freelancer said:

    Except you can't seed this one (right? or am I [i]that[/i] tired?)

     

    You can seed it by calling it an arbitrary number of times when the application is launched (say, based on the number of seconds since January 1, 1970 expressed as an unsigned short), and that's what the application does.



  • @asuffield said:

    @misguided said:

    my favorite part is how it casts it to an int, and then divides it by ten, and then expects it to still return an int.

    Is there something funny about C's integer division operator? Any int divided by 10 is an int. If you want a floating point result, you must divide by 10.0 instead. 

    Awww, what fun are you guys?  Can't you let red herrings lie for one damn page?  I didn't SAY there was anything funny about it.  I just said I liked that part.  heh ;) 



  • @aquanight said:

    It seems the only thing WTF-y about this is the odd way to do modulus (I guess return (iy %= 2796203L)/2796203.0; was too easy), and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.

    Oh, the algorithm is complete crack. It's just so utterly stupid that it's really not worth the time and effort needed to list all the ways in which it is wrong. Smells like student code. 



  • @Carnildo said:

    @aquanight said:

    Interesting fact: RandomBetweenOneAnd(2) does in fact return different values each time. I made a quick program that called it 16 times, and got this sequence: 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1. So it's not as bad as the OP made it look like (that was my impression anyway: the last sentence had me suspecting it'd be returning 1 or 2 every time, but it turns out it doesn't.)

    I haven't tested it, but it should show a bias towards returning "1".

    Yeah, my sample of 16 isn't big enough to determine bias. So I ran it with a bigger sample (1 million, then 1 billion), and got about 66.9% '1' returns, so you would be right on this.

    @Carnildo said:

    It seems the only thing WTF-y about this is the odd way to do modulus (I guess return (iy %= 2796203L)/2796203.0; was too easy)

    The result of "iy = iy %2796203" is compiler- and CPU- dependant for negative values of "iy"; the result of "iy -= (iy/2796203L) * 2796203L" isn't.

    Since the RandomBetweenOneAnd() (which is what's calling S_ran) forcibly converts anything negative to positive, I'd see no problem with just making iy unsigned and avoid the odd modulus-of-negative stuff.

    @Carnildo said:

    and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.

    C++, and yes, that's something of a WTF: even the worst implementation of rand() has generally better characteristics than S_ran() does. What you're missing is the effects of the if()s and while()s in RandomBetweenOneAnd(): those bias the numbers returned towards the ends of the range, and prevent certain numbers from being returned at all.

    My comment in regards to srand()/rand() was to the whole thing collectively, not specifically S_ran. But yeah, looking at the code, it wasn't that obvious to me at first glance that all that stuff would bias everything. Also, having tested this (using 1 million samples and varying ranges), it is specifically biased toward the lower ends, which generally wound up with twice what each of the other values had with low maximums (like 20, or 6). When I tried a higher limit like 2000, the lower end had a whole block (1~19) that was 0, but the rest was reasonably even (lowest I've found this is at a range of 1000). Either way, I've yet to find a range that causes any biasing toward the upper end. I've yet to find anything that causes a gap in the middle or anything like that, but I'm not sure I want to play with it too much. I've probably wasted too much time on this as it is ...



  • There's a lot of kludges in this code.  Some of the more obvious ones:'

     

    (1)  The rng function returns a double, but it can only return a bit over two million diferent values, wasting half the precison of a typical double.

     

    (2)  then it multiplies the 0..1 value by the wrong scale factor.  To get a number between one and x you want to multiply the 0..1 rng by (x-1), not (x+0.99).

     

    (3)  Then since the result can be somewhat out of range, there's the kludge of dividing by ten  if it's too big.

     

    (4)  And also the strange business of scaling up results under 0.01. 

     
    Just plain badly thought out code, then badly patched  to boot.
     



  • @aquanight said:

    ...

    and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.

    RAND_MAX is 0x7FFF in the C implementation I currently use, and there are times I wish it was higher. It's far too late for me to calculate if rand()<<16 | rand() would have the same distribution (it should.... random bits, no?) but why should I have to resort to that?

    In any case they should have just used a simple PRNG of the sort rand[x] = A*rand[x-1] % B, and then maybe added wrappers for floating-point types.

    Or, you know, used one of the million Mersenne Twister implementations available.



  • @aib said:

    @aquanight said:

    ...

    and the ignoring of the random-number stuff that the standard library for any language this could reasonably be (guessing C or C++ based on the 'static double' function and the 'static long int' variable, and the explicit 'void' in the parameter list, so srand() and rand()) would provide.

    RAND_MAX is 0x7FFF in the C implementation I currently use, and there are times I wish it was higher.

    Check whether it implements drand48() and lrand48(), that's the usual solution. A lot of systems only have a 16-bit rand() function, but also provide the old 48-bit functions, despite them being listed as obsolete.

     

    It's far too late for me to calculate if rand()<<16 | rand() would have the same distribution (it should.... random bits, no?)

    The distribution of a single value is the same, but the distribution of a sequence of values drawn in this form is neither statistically nor cryptographically sound in the general case. Don't use it for simulation purposes, they require more subtle approaches (obviously can't use rand() for cryptography). For anything else, it is adequate.

    Individual implementations may use PRNGs where it is actually okay,
    but most don't - they're slower than the common algorithms - and you can't rely on it in vaguely portable code.



  • function getRandom()
    {
        return 3; // chosen by fair dice roll, guarantied to be random
    }
     


  • Discourse touched me in a no-no place



  • @aib said:

    Or, you know, used one of the million Mersenne Twister implementations available.

    This code pre-dates the development of the Mersenne Twister by at least a year and possibly by as much as a decade.

    The Real WTFTM is that people expect old code to have been written using new software libraries.



  • @Carnildo said:

    @aib said:

    Or, you know, used one of the million Mersenne Twister implementations available.

    This code pre-dates the development of the Mersenne Twister by at least a year and possibly by as much as a decade.

    And how was I supposed to know that?



  • @aib said:

    In any case they should have just used a simple PRNG of the sort rand[x] = A*rand[x-1] % B, and then maybe added wrappers for floating-point types.
    That's called a Linear Congruential Generator, and that is what the S_ran function implements! (Hint: A=125, B=2796203.) He just returns the result as a double in [0,1) rather than a long in [0,2796203). His WTFs lie in the scaling function, which uses the hacks listed above, not in the PRNG...



  • @random_garbage said:

    His WTFs lie in the scaling function, which uses the hacks listed above, not in the PRNG...

    Except for the bit where he uses the same seed every time. That one's in the PRNG. 



  • @random_garbage said:

    @aib said:
    In any case they should have just used a simple PRNG of the sort rand[x] = A*rand[x-1] % B, and then maybe added wrappers for floating-point types.
    That's called a Linear Congruential Generator, and that is what the S_ran function implements! (Hint: A=125, B=2796203.) He just returns the result as a double in [0,1) rather than a long in [0,2796203). His WTFs lie in the scaling function, which uses the hacks listed above, not in the PRNG...

    Yeah I know. I meant (tried to) that the stuff following the generation was unnecessary (and wrong in any case). And thanks for the Wikipedia link, you had to do that to me on my free day, didn't you?

     

    rand() / (double)RAND_MAX



  • @aib said:

    RAND_MAX is 0x7FFF in the C implementation I currently use, and there are times I wish it was higher. It's far too late for me to calculate if rand()<<16 | rand() would have the same distribution (it should.... random bits, no?) but why should I have to resort to that?

    If RAND_MAX is 0x7FFF, it's probably using the classic buggy generator. In that case, the low-order bit of rand()<<16 | rand() will always be the same: it will only ever generate even numbers or odd numbers, depending on the seed. Bit 16 will also be constant, but that's less of a problem.



  • @Carnildo said:

    @aib said:

    RAND_MAX is 0x7FFF in the C implementation I currently use, and there are times I wish it was higher. It's far too late for me to calculate if rand()<<16 | rand() would have the same distribution (it should.... random bits, no?) but why should I have to resort to that?

    If RAND_MAX is 0x7FFF, it's probably using the classic buggy generator. In that case, the low-order bit of rand()<<16 | rand() will always be the same: it will only ever generate even numbers or odd numbers, depending on the seed. Bit 16 will also be constant, but that's less of a problem.

    I'm using lcc-win32, it seems fine. I think I'd have noticed rand()&1 not working, though of course that doesn't mean there aren't any other bugs.

    Also, given: 

    #define RAND_MAX 0x7FFF

    ... 

    #ifndef RAND_MAX
    #define RAND_MAX 0x7fff
    #endif
     

    I'm less inclined to trust the implementation now.

    Though of course as long as it doesn't have any obvious bugs like the one you've suggested, it's good enough for quick and dirty stuff.



  • @aquanight said:

    My comment in regards to srand()/rand() was to the whole thing collectively, not specifically S_ran. But yeah, looking at the code, it wasn't that obvious to me at first glance that all that stuff would bias everything. Also, having tested this (using 1 million samples and varying ranges), it is specifically biased toward the lower ends, which generally wound up with twice what each of the other values had with low maximums (like 20, or 6). When I tried a higher limit like 2000, the lower end had a whole block (1~19) that was 0, but the rest was reasonably even (lowest I've found this is at a range of 1000). Either way, I've yet to find a range that causes any biasing toward the upper end. I've yet to find anything that causes a gap in the middle or anything like that, but I'm not sure I want to play with it too much. I've probably wasted too much time on this as it is ...

    Man you got some serious free time on your hands :) 


Log in to reply