Brian Hamrick

In this post I want to talk about something that I'm calling the "Humans are bad at randomness" fallacy. If you've read any discussions on a game with random mechanics, you've almost certainly seen a conversation that goes something like this:

Person 1: I just had happen to me multiple times in a row! Something has to be broken with the RNG.

Person 2: Nah, that's just how randomness works. Humans are bad at intuitively understanding randomness. It's just your confirmation bias / negativity bias driving you to think it's bugged when there's actually nothing wrong.

It's not wrong to say that humans have biases that cause them to have a hard time judging whether something is properly random, but in my observations of these discussions, this argument is invoked almost as a reflex. There's seemingly no real consideration given to the complaint before dismissing it as just "Humans are bad at randomness."

This often results in either no investigation or a very cursory investigation into the randomness, and as such even when there are, in fact, bugs that significantly affect the random outcomes, those bugs can live for an extremely long time.

Let's take a look at some real life examples.

Pokémon Red/Blue/Yellow

This story takes place around 2013. The speedrun route for Pokémon Yellow (the plan for beating the game as fast as possible) involves catching a level 6 Nidoran Male at the grass before Viridian Forest. Unfortunately, the chance for an encounter in that grass to be the desired level 6 Nidoran Male is only 5% (13/256, to be precise). Since each extra encounter adds a lot of time to a run where the goal is to be as fast as possible, speedrunners would typically only look at a couple encounters before starting over from the beginning.

The most active Pokémon Yellow speedrunner at the time started noticing something interesting. If the first encounter they got was a level 6 Nidoran Female, and then they got the second encounter on the third step afterward (the game prevents you from getting an encounter on the first two steps, so this is the earliest possible time), then they swore that it was almost always a level 6 Nidoran Male.

In this story, I was Person 2, at least at first. I said that they must have just been remembering the coincidences. But, to no surprise to the reader given the topic of this post, I was wrong.

It turned out that the Tool-Assisted Speedrun community had noticed similar weirdness in the encounters being generated a few years prior. Tool-assisted speedruns, or TASs for short, are created with the help of emulators which allows the authors to select exactly which frame each button press lands on. For most random behaviors in the game, adjusting a button press by a frame or two could change the result entirely. But, for encounters, such changes might cause encounters to appear or disappear as expected, when there were encounters, the sequence of encounters would only change very mildly.

The culprit is that the game maintains two random bytes. Although the game isn't using anything like a modern RNG, each of the random bytes on its own is unpredictable enough to be considered random. However, due to the specifics of the RNG, the sum of the two random bytes only changes relatively slowly.

Most randomness in the game only used a single one of the two bytes, and so moving the event by a frame would result in a essentially unrelated result (although, in theory it would be possible to observe some patterns if you looked at the results on enough consecutive frames). But encounters used both random bytes, one to determine whether an encounter happens, and the other to determine which encounter it is.

The strange behavior that was noticed was because the fact that an encounter happens constrains the first random byte, which in combination with the sum also constrains the second random byte. Since the sum only changes slowly over time, so too does the subset of encounter types that are possible.

Release date: February 1996 (JP), September 1998 (US)

Behavior figured out: August 2007

Slay the Spire

The bug in this case was significantly more obvious than the one in the Pokémon case, but given the ease with which it could have been identified, I think the time it lasted past the game's release is still surprisingly long!

Slay the Spire is a card game where each card has an energy cost to play. There are relics in the game which provide various effects. One of them, Snecko Eye, randomizes the cost of each drawn card to a uniform number from 0 to 3.

In a pattern quite typical of these issues, some players hated the relic, saying that they felt that it kept putting them in situations where all their cards cost 3 energy (the baseline maximum for a turn), and resulting in them losing the game. Other players countered with mathematical calculations that show that on average you'd be able to play more cards with Snecko Eye than without, and that it's just negativity bias that's causing the dislike.

The developer of the game also posted the line of code where the energy costs were being generated:

int newCost = AbstractDungeon.cardRandomRng.random(3); // random between 0-3

It's so simple, it couldn't possibly be wrong.

It turned out that the RNG in the game was being reseeded on every floor in such a way that the sequence of generated numbers was exactly the same, offset by one result per floor. So for fights on two adjacent floors, if you had Snecko Eye, you'd get the exact same sequence of energy costs, but offset by one card. There could be further deviations if other parts of the fight consumed random numbers as well.

The result of this bug is that if you look at the overall distribution of energy costs across a large number of runs, you'll see each one generated one quarter of the time, as expected. However, within a single run, since the generated costs were repeating floor after floor, lopsided distributions where one cost was favored over the other were significantly more likely than it would have been with independently random events.

Release date (EA): November 2017

Bug noticed and elaborated: February 2018

MapleStory

One of the features that MapleStory added is called "inner ability". Your character gets three lines which each can provide a stat bonus. You can reroll the lines to try to find more desirable stats, and while rerolling them you can lock the lines that you like by paying a bit more of the reroll currency.

I don't remember any suspicions of something being wrong with the system, until suddenly a Korean post showed a method of increasing the chances of the lines that you want, called the "recipe method".

Behind the scenes, there's a big list of all the possible inner ability lines. For illustrating the issue, let's just imagine that the list has 100 items, so we're generating three distinct numbers from 1 to 100.

When generating the first two numbers, everything was uniform as expected. However, when generating the last number, instead of picking uniformly from the remaining 98 numbers, the game would first make a 50-50 choice of whether the third number would be between the other two or not between them.

So if you locked lines 40 and 50, you'd have a 50 percent chance that the rolled third line would be between 41 and 49 (and then uniform between those nine), and a 50 percent chance that the rolled third line would be one of the other 89 options (1-39 and 51-100). This means that you're significantly more likely to get what you want if you first roll the two of your desired lines that "sandwich" your third desired line in a small interval.

Similar to the Slay the Spire example, if you were to look at the probabilities of each individual line appearing on no-lock rolls, you'd see a uniform distribution. You would see lopsided distributions if you looked specifically at rolls that locked lines, especially ones that locked two lines, because not all lines would be equally likely to be locked by players.

After this got publicized, it got patched out within a few months.

Inner ability added to the game: July 2012

Bug figured out: February 2021

Destiny 2

I don't personally play Destiny 2, so my narration for this example may be less reliable than the others.

Destiny 2 weapons have "perks", which are extra effects from a list of possibilities. Each weapon can have two ability perks, and players would grind looking for the best combination. At one point, a bunch of players started feeling that they couldn't find the combination they were looking for.

The two perks for a weapon typically have 6 options (although some have more or fewer), so the desired combination is expected to be a 1/36 chance. Some people started feeling like they were not getting it in a reasonable time and started thinking that the perks were weighted.

In keeping with the pattern, the initial responses were that the engine literally can't weight perks, and it's "just RNG" and "statements of feel".

Some members of the community were still not convinced, and built tools to crowd-source drop rates. Eventually the effort culminated in very convincing data indicating that perk combinations are non-uniform. (The linked document is probably a better narrator for this issue than my post).

Notice that it's perk combinations that are non-uniform. The individual perks are uniform, much like what happened in the other examples. Eventually, Bungie took another closer look at the perk generation code and confirmed that the issue was real. To their credit, they also posted a nice article describing the cause of the issue.

As an external observer, it's not clear to me exactly how long the bug was present in the game. It sounds like it may have been all the way starting from the release.

Release: September 2017

Bug found: October/November 2024

Takeaways

None of these examples appear to involve the game companies having any intent to deceive the players. Furthermore, looking at the random events in a one-dimensional fashion (looking at the distribution of a single variable) would not have revealed any of these problems.

Unfortunately, every time that I've seen a game company attempt an analysis of their own random systems without external guidance investigates solely along these one-dimensional lines. They collect the aggregate appearance rate of individual items, and if each of them appears at the expected rate, they conclude that there are no issues.

While it's true that people tend to be bad at evaluating randomness, it's important to realize that this applies just as much to the developers as to the players. The fact that the developers are bad at distinguishing fair randomness from unfair randomness means that when a bug is introduced that causes unfair randomness, that bug is disproportionately likely to go through testing without being noticed, get shipped to the final product, and then escape notice for a long time even after that.

Furthermore, game developers are generally not statisticians, cryptographers, randomized algorithm specialists, or anything similar. It's not expected that they have a lot of experience in the myriad of ways that random number generation could go subtly wrong. So when they miss these kinds of bugs, it's very understandable. But I do object to the way that the "Humans are bad at randomness" line frequently gets delivered in a very condescending way, despite the fact that many of the games probably do have statisticians, cryptographers, and such in their audience.

To sum up my current thoughts on the subject:

If the developer's aren't acting in bad faith (e.g. intentionally publishing false or misleading rates), then the appearance rates of individual items is likely to be correct, since it's relatively easy to test and notice devations.
We can expect that joint distributions of items are probably not tested, and because of the combinatorial explosion, each individual combination is less frequent and it is hard to notice if the rates are off without playing a lot. Players tend to play the game more than developers do, and sometimes the data required is even beyond a single player.
The fact that humans are bad at intuiting fair randomness means that the randomness can deviate from fair by a considerable amount before people start noticing it. There will be a baseline frequency of complaints due to the people who perceive fair as unfair, but when the complaints rise above that baseline, it probably means things are very wrong.
If a developer says they've investigated the RNG without explaining their methodology, it's likely that the investigation only looked at the most obvious ways that things could be broken (like individual item rates being wrong), and shouldn't be considered strong evidence that there aren't less obvious issues.
Even bugs that produce quite significant deviations from fair in joint distributions can live for months or years while people complaining are dismissed as simply being humans that are bad at understanding randomness.