V8's implementation of Math.random() was fucked



  • TL;DR: Google's JavaScript engine used an old, not-particularly-good PRNG to implement Math.random(), and they used an old version of it that was known-buggy when they integrated it. The result was random output that looks like this:
    http://1.bp.blogspot.com/-92e22bUMwwk/VnKQn2VT3EI/AAAAAAAAB-E/8XD12xuGTXc/s1600/Untitled%2Bdrawing.png


  • Discourse touched me in a no-no place

    I remember bolting the Mersenne Twister into a roguelike about 15 years ago for basically that reason.



  • Man that explains why that WebGL shader I wrote ages ago couldn't produce any decent noise.



  • With the Mersenne Twister being so good, well tested and widely available, why would anyone bother using anything else?

    Heck, V8 is written in C++, why not just call rand() and let the standard library do it for you?
    (edit: yes, yes, I get it, rand() sucks)

    Well, at least it's not as bad as Java's "SecureRandom".



  • @anonymous234 said:

    why not just call rand() and let the standard library do it for you?

    Because rand() is notoriously bad (far worse than the broken algorithm that was in use) and is going to be deprecated soon if it isn't already. Instead, we now use the random number generators which were introduced in C++11:



  • Mersenne Twister is relatively slow compared to other PRNG implementations, and it has a relatively large amount of state (~2 kilobytes).

    But yes, it would have been a way better decision.

    (in addition to rand() often being famously terrible like LB_ pointed out, IIRC the algorithm isn't specified, and the V8 developers probably don't want to rely on the standard library they end up linking with having a sane implementation)



  • @jmp said:

    V8 developers probably don't want to rely on the standard library they end up linking with having a sane implementation

    So they decided to be sure and wrote their own insane implementation.



  • Oh, also RAND_MAX is only guaranteed to be >= 32767, whereas JS rand() is required to return values up to 2**32. Expanding the former to the latter could result in some distribution and periodicity issues. 😛


  • Discourse touched me in a no-no place

    @anonymous234 said:

    With the Mersenne Twister being so good, well tested and widely available, why would anyone bother using anything else?

    Generally because you're talking about software written before MT existed, or that was written by people who didn't know about it, or because it wasn't necessarily a good fit for the application.

    @anonymous234 said:

    Heck, V8 is written in C++, why not just call rand() and let the standard library do it for you?

    It's well-known that lots of C and C++ implementations historically have had bad PRNGs behind rand() et all.


  • Discourse touched me in a no-no place

    @jmp said:

    RAND_MAX is only guaranteed to be >= 32767

    Galactic Bloodshed was written with the expectation of a 48-bit RNG being available and (for example) universe generation failed horribly with a 16-bit RNG.



  • Because it's slow. I would recommend an LFSR or xorshift.




  • Fake News

    Hanzo'd by LB_, it got only introduced in C++11.



  • C++0x is ancient. It's like a quarter of my age. That's pretty old for software.



  • @JBert said:

    Hanzo'd by LB_, it got only introduced in C++11.

    The RNGs were already included in the TR1.

    @ben_lubar said:

    C++0xTR1 is ancient. It's like a quarterhalf of my age.

    FTFY.



  • @FrostCat said:

    Galactic Bloodshed

    Yes and obviously this site (game?) nobody's ever heard of should be the driving force behind ALL technology decisions everywhere.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    Yes and obviously this site (game?) nobody's ever heard of should be the driving force behind ALL technology decisions everywhere.

    In a way similar to the way everyone's heard of "retsupurae", right?

    You don't need to be familiar with it for the purpose of the reason I mentioned it.


  • ♿ (Parody)

    @FrostCat said:

    In a way similar to the way everyone's heard of "retsupurae", right?

    That always sounds like some kind of gross medical condition.



  • @anonymous234 said:

    why not just call rand() and let the standard library do it for you?

    As far as the C/C++ standard is concerned that would be fine. It doesn't specify any algorithm. Unfortunately, POSIX specifies an algorithm, and it's not a very good one. So people on POSIX-compliant systems are SOL if they use rand().

    POSIX long ago introduced random() which is a much better algorithm. Why they didn't simply change the algorithm used by rand() (as many non-POSIX platforms did) instead is a good question. I assume they were concerned that some apps might be dependent on the specific algorithm used and didn't want to risk breaking them.

    The C++11 approach is an interesting solution to this problem. Instead of providing a single implementation-defined algorithm, there is a basket full of different well-defined algorithms to choose from. Great if you're doing something where it's important, but a bit confusing if you don't care and just want to get a reasonably-random sequence of numbers.



  • @David_C said:

    you don't care and just want to get a reasonably-random sequence of numbers.

    That's what std::default_random_engine is for. There's no excuse to call rand() in modern C++.



  • Just call it 2-3 times in a loop, what's the big deal, eh?


  • area_pol

    @anonymous234 said:

    Heck, V8 is written in C++, why not just call rand() and let the standard library do it for you?

    Because rand() is not a C++ way to generate random numbers. rand() is C and it sucks balls.


  • Discourse touched me in a no-no place

    @David_C said:

    I assume they were concerned that some apps might be dependent on the specific algorithm used and didn't want to risk breaking them.

    Historically, that seems to be the reason for a lot of seemingly strange decisions by both the C and C++ committees.



  • @FrostCat said:

    @David_C said:
    I assume they were concerned that some apps might be dependent on the specific algorithm used and didn't want to risk breaking them.

    Historically, that seems to be the reason for a lot of seemingly strange decisions by both the C and C++ committees.

    Looking a bit further, this situation (rand) appears to not be the fault of any standards body but a de-facto standard by C library implementers who decided for some reason (probably backward compatibility) that it would be better to leave the broken alorithm in-place and tell developers to use a new function for new applications.

    ISO-9899:2011 (C language) doesn't mandate any particular algorithm for rand() - it only says (section 7.22.2.1):

    • There is no requirement to avoid race conditions with other pseudo-random functions
    • The implementation shall behave as if no library function calls rand
    • RAND_MAX shall be at least 32767

    C++'s ISO standard doesn't mention rand() at all. Its support is only implied by the fact that <cstdlib> is meant to wrap the entire contents of C's <stdlib.h>.

    I had thought POSIX specified an implementation, but when I looked up [URL=http://pubs.opengroup.org/onlinepubs/9699919799/functions/rand.html]that standard[/URL], the definition is a clone of the C language definition, and explicitly states that any differences are unintentional.

    So there really is no excuse for a compiler/library to ship with the well-known-broken version of rand() that is commonly (but not universally) shipped. Fortunately, it appears that those implementations that use the broken algorithm also include the set of POSIX random number functions as well, so portable code written in C (or that needs to be portable to a pre-11 C++ compiler) can work around the problem without providing a private implementation. (For example, write a wrapper function that either calls random() or rand() depending on what's available) But there is no good reason why developers should have to do this.

    Looking a bit further, it appears that the Linux community did decide to fix this at long last. The [URL=http://linux.die.net/man/3/rand]man page[/URL] states that rand() and random() use the same generator these days and that only older implementations use the broken algorithm.

    Of course, neither of these algorithms is cryptographically secure. But that's not a problem if the number are not being used for security purposes.



  • That “before” image looks a lot better to replicate snow on a TV screen than pure random noise does.


  • Discourse touched me in a no-no place

    @David_C said:

    Looking a bit further, this situation (rand) appears to not be the fault of any standards body but a de-facto standard by C library implementers who decided for some reason (probably backward compatibility) that it would be better to leave the broken alorithm in-place and tell developers to use a new function for new applications.

    Yeah. I'm just saying, whenever you see something weird like that ("why wasn't this fixed?") in C, it probably was due to a major customer of a major compiler vendor who "needed" it not to change after it was discovered to be broken. Because from what I've heard over the years, that happens a lot, and therefore much of the UB in C is left in because someone's application depended on it decades ago.



  • If you were using noise1() or something in the fragment shader, that wouldn't have been using this; instead, it's up to the driver vendor for your GPU. It's so inconsistent and frequently flat out bad that most shaders use a texture when they need noise.



  • @nexekho said:

    frequently flat out bad

    To translate "flat out bad" into something more tangible: most GL implementations have noise*() functions that just always return 0.


  • Discourse touched me in a no-no place

    @David_C said:

    I assume they were concerned that some apps might be dependent on the specific algorithm used and didn't want to risk breaking them.

    Some people doing Monte Carlo simulation techniques want random numbers, yet exactly reproducible results so that others can redo their work and get the same answers out, down to the bit.

    I do not think the word “random” means what they think it means.



  • @dkf said:

    Some people doing Monte Carlo simulation techniques want random numbers, yet exactly reproducible results so that others can redo their work and get the same answers out, down to the bit.

    This is also true of most procedural methods (procedural terrain, textures, geometry, whatever). For the same input sequence, you want to get the same outputs.

    The above-mentioned noise*() was clearly intended for this purpose. It doesn't have any internal state, rather the user passes some value to it and noise*() returns a some other value with some properties.

    As for Monte Carlo simulations ... you should kinda get the same answers regardless of the sequence of random numbers, right? Because otherwise it's sort-of bad (like, not properly converged).


  • Discourse touched me in a no-no place

    @cvi said:

    As for Monte Carlo simulations ... you should kinda get the same answers regardless of the sequence of random numbers, right? Because otherwise it's sort-of bad (like, not properly converged).

    Yes. But a lot of people who use MC techniques also want exact reproducibility (as opposed to just to some probably-physically-reasonable number of significant figures). That the concepts are opposed to each other in a deep way doesn't seem to impinge upon their thought processes.


  • ♿ (Parody)

    @dkf said:

    That the concepts are opposed to each other in a deep way doesn't seem to impinge upon their thought processes.

    Meh...if you're using pseudo random numbers, you're using pseudo random numbers.



  • The concepts are not opposed, if you define the algorithm and the starting state, you can get the same sequence of (for all intents and purposes) random numbers.

    But you have to make sure the library actually guarantees the same output for the same seed, even across platforms. 90% of the time it does not, but since it happens to work on the computer they test it on, people just assume it will work everywhere.



  • Well, looking at those images actually illustrates nothing. Both are completely valid examples of a random output, it's just that people tend to spot patterns in anything and deem anything with a pattern as not random.

    Now I'm not saying that the original Math.random() implementation was fine, just that the example image that's meant to show the problem isn't really showing anything.



  • From the link:
    "The more significant upper half of the result is almost entirely dependent on the value of state0. The period length would be at most 232, but instead of few large permutation cycles, there are many short ones. With a badly chosen initial state, the cycle length could be less than 40 million." (my bold)

    "It fails many statistical tests in the TestU01 suite."

    There is an actual problem here.

    If your randomness troll just says "9" all the time, well maybe it's using a random process with uniform probability to generate its results, but it probably isn't. If your random numbers have structure when viewed through some lense, well there goes your uniform probability.

    Looking at the code for the visualization: http://bl.ocks.org/mmalone/bf59aa2e44c44dde78ac it looks like it's doing something vaguely similar to the spectral test: https://en.wikipedia.org/wiki/Spectral_test . Looks like there's some correlation between two random draws one after the other; that causes the 'lines' in the image.

    tl;dr if you have a = random(); b = random() it's desirable that knowing a shouldn't let you infer much about what b is; the lines in the image represent information you can infer about what b is.



  • If I roll a die 100 times in a row and get 6's each time, that is still random and doesn't mean it's weighted. I'm saying that you can't tell if something is truly random from the results only. You have to look at the underlying implementation of the random engine to see if it is giving random results (to within acceptable parameters, these are computers after all).



  • @Ashley_Sheridan said:

    If I roll a die 100 times in a row and get 6's each time, that is still random and doesn't mean it's weighted.

    that might be the case, but I can pretty much guarantee that if you manage it in just about any casino you won't be welcome back. An outcome like that could be random, but is far more likely to be foul play.



  • The probability of that outcome is 1/699 (assuming it's any particular number 100 times in a row, not just 6), so a Bayes factor of ~1078 towards a perfectly-weighted die that always rolls a particular number. I'm not saying that's /certain/, I'm just saying if you had rolled a d6 a hundred times every second since the start of the universe you still wouldn't expect it to have happened.

    If you can compress a long sequence of numbers, it was almost certainly not generated by a random process.

    Yes this is statistical, but it's a pretty good bet.



  • That's the thing! It is a perfectly random possibility, but people have a bad habit of seeing patterns where none exist!



  • @jmp sorry, but you just don't understand probability then. Any series of 100 dice rolls, there's the same chance of 100 6's as any other combination. Any combination has the same 1/6**99 (assuming your math is correct). It's a fallacy that random numbers ought to be random by a general persons observation. There's a big difference between actual random and perceived random.



  • The probability I gave was for "every number the same", not for "the specific sequence 6, 6, 6...6". The distinction is important; it dodges the issue where any given sequence has a very low likelihood.

    Thinking about this in terms of 'compressibility' is a better way; a sequence of 100 6s is very compressible, an actually random sequence is much less so.

    The image on the left is more compressible than the image on the right, because you can exploit the lines.

    Random number generators that produce output like that do not pass statistical tests for randomness. That's what the statistical tests /look for/.



  • No, I'd still say that @jmp is correct. Given any 100-long sequence of numbers ∈ {1..6}, the probability of observing that sequence is around 1e-78 with unbiased dice.

    Essentially, if I give you such a sequence and you then roll your dice and repeat that sequence, it would be a good assumption that your dice are biased because it's very unlikely (prob = 1e-78) that you would roll that sequence with unbiased dice.

    The fact that the example sequence is a repetition of a single number doesn't matter.



  • @Ashley_Sheridan said:

    @jmp sorry, but you just don't understand probability then. Any series of 100 dice rolls, there's the same chance of 100 6's as any other combination. Any combination has the same 1/6**99 (assuming your math is correct). It's a fallacy that random numbers ought to be random by a general persons observation. There's a big difference between actual random and perceived random.

    Here:

    Randomness is a mathematically well-defined concept, even if hard to actually make deductions about it in practice.

    And deductions about an algorithm from its output can be done, the same way we make deductions about the underlying "algorithm" of the universe by looking at its output, with bayesian inference.

    If there's a simple algorithm A that produces an output with relatively high probability, and another simple algorithm B that produces the same output with very low probability, and we observe that output somewhere, we can say that algorithm A is much more likely to be the one behind it.

    For example, since the probability of a fair die giving the 100 sixes in a row is 0.0000000000000000000000000000000000000000000000000000000000000000000000000000015, and let's say the probability of that random die you bought is somewhere being weighted is in the order of 0.01%, then we can say with 99.999999999999999999999999..9% confidence that the second hypothesis is the true one. That level of confidence is much higher than many things we already assume to be true (like your RAM not spontaneously flipping some bits and returning entirely different results when you run the function) so it's entirely fine to say we "know" the die to be loaded.

    So TL;DR: we can see patterns in the first picture because there are patterns, and we can deduce the algorithm is not random from that.



  • If you squint your eyes at the first one, you'll see Abraham Lincoln.


  • Discourse touched me in a no-no place

    @anonymous234 said:

    That level of confidence is much higher than many things we already assume to be true (like your RAM not spontaneously flipping some bits and returning entirely different results when you run the function) so it's entirely fine to say we "know" the die to be loaded.

    No, we just have to consider whether we believe that the dice are loaded or that we've seen a very rare event. In the space of all possible universe configurations, both hypotheses have non-zero probability. One way to determine this is to observe even more rolls; if the sixes keep coming, the confidence in the “loaded” possibility grows, whereas if there is a range of numbers being generated then it looks more like we just observed something rare.

    Or that someone's doctored the dice so that the loading of them can be disabled remotely. 😉



  • @Ashley_Sheridan said:

    If I roll a die 100 times in a row and get 6's each time, that is still random and doesn't mean it's weighted.

    It might instead mean that reality is defective:



  • what the...?

    binary die?



  • d4



  • oh, so the side facing down is the number

    This one is more intuitive



  • @xaade said:

    binary die?

    Also known as "a coin".


Log in to reply