There's got to be a better way of doing a random choice from a weighted set (C#)...

Benjamin Hall

So for a side project, I'm doing random generation based on d% tables.

Something like

Roll	Outcome
1-3	"foo"
4-50	"bar"
51-100	"baz

except generally with a lot more possibilities.

Long ago (when I was young and stupid) I developed a data structure I called WeightedChoiceSet.

public class WeightedChoiceSet<T>
    {
        public WeightedChoiceSet(Dictionary<T, double> input)
        {
            var currentSum = 0.0;
            foreach (var kvp in input)
            {
                currentSum += kvp.Value;
                table[kvp.Key] = currentSum;
            }
            Normalize();
        }

        public int Count => table.Count();

        public T Match(double r)
        {
            
            return table.First(x => x.Value >= r).Key;
        }

        public void AddOrModify(T key, double value)
        {
            table[key] = value;
            Normalize();
        }

        public void Remove(T key)
        {
            table.Remove(key);
            Normalize();
        }

        private Dictionary<T, double> table = new Dictionary<T, double>();

        private void Normalize()
        {
            if (table.Count == 0) { return; }
            var sum = table.Values.Max();
            
            foreach (var key in table.Keys.ToList())
            {
                table[key] = table[key] / sum;
            }
        }

        public Dictionary<T, double> ToDictionary() => table;
    }

used like

var foo = new WeightedChoiceSet(new Dictionary<string, double>{ {"foo", 3.0},{"bar", 47.0}, {"baz", 50.0}});
var outcome = foo.Match(r.NextDouble());

The disadvantage is that the outcomes have to be unique (obviously, they're keys in that input dictionary). The tables usually come from a configuration file somewhere, and need to be accessible to be changed at run-time.

But this whole thing feels so super clunky. Is there no standard function/data structure for something like this? Or something better to do? A better way of approaching the problem?

I mean I could stick the random generator in the WeightedChoiceSet and just call foo.Choose() or something like that to get a new value. But that's a tiny thing.

apapadimoulis

The class WeightedChoiceSet is a bit complicated.

This is a standard problem, just involves using an ordered list, summing the weights, picking a number from 0 to max, and counting up until you hit that number.

var randomDouble = 0.4265612; 

var choice = dict.Values.Sum() * randomDouble;
var counter = 0;
foreach (var kvp in dict){
  counter += kvp.Value;
  if (counter >= choice)
    return kvp.Key;
}
return dict.Keys.Last();

If you don't want to be unique, then just use a List<Tuple<T,double>>.

In either case, you could add it as an extension method for Dict<T,double> or something like that.

Tsaukpaetra

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

this whole thing feels so super clunky

It's not super clunky. Just wasteful and inefficient I guess.

I did something quite similar when I made the loot box pull logic in the database...

Watson

Last time I had the job I did it in a manner that was roughly similar.

For each item I calculated and stored the conditional probability of selecting that item at random given that I didn't select any items that came before it in the list.

Then I'd decide at random whether or not to pick the first item; if not, I'd then decide at random whether or not to pick the second item (using the probability of picking the second item given that I hadn't picked the first), and so on until (in the worst case) I'd come to the last item which by definition I'd pick with certainty.

I was pulling a new random number for each item until I picked one, which was fine for the comparatively small list I had. I don't suppose it would have made much difference if I'd just picked a single random number at the start and reused it throughout, effectively making it a differently-scaled version of this here.

apapadimoulis

@Watson said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Then I'd decide at random whether or not to pick the first item; if not, I'd then decide at random whether or not to pick the second item (using the probability of picking the second item given that I hadn't picked the first), and so on until (in the worst case) I'd come to the last item which by definition I'd pick with certainty.

I wonder what the actual distribution of this would look like, compared to predicted/indented distribution?

I find the pseudorandom stuff interesting, and just seeing from past WTFs how things like "random * random + random / random" leads to uneven distributions for some reason.

cartman82

When I needed similar system for producing weighted choices, I initially tried designing some kind of a fancy bell curve-like math function. But in the end, I realized I feel a lot more comfortable just setting a segmented line and pulling numbers from that, similar to your implementation. KISS.

Rhywden

@apapadimoulis said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@Watson said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Then I'd decide at random whether or not to pick the first item; if not, I'd then decide at random whether or not to pick the second item (using the probability of picking the second item given that I hadn't picked the first), and so on until (in the worst case) I'd come to the last item which by definition I'd pick with certainty.

I wonder what the actual distribution of this would look like, compared to predicted/indented distribution?

I find the pseudorandom stuff interesting, and just seeing from past WTFs how things like "random * random + random / random" leads to uneven distributions for some reason.

Probably rather different in practice - after all, instead of having one random event you have multiple ones. Let's say you have four items with equally distributed chances - with a "flat" model each gets 25%.

Now, with his model the first roll gets 25% for the first item. But what chance do we actually apply to the second roll for the second item? Cannot be 25% but neither 50% - would need to be 33% to get a 25% total chance. Next one would be 50:50.

And that's for an easy case.

I did a simple tree for those four and I dare say that a lookup list is the better way to go about it. Would also be faster.

edit: Then again, you could use a recursive function. Might look like this:

double[] chances = [0.25, 0.25, 0.25, 0.25]

int determineWin(int index, double loserChance) {
   if(index == chances.length - 1) return index;
   double roll = Math.random();
   if(roll > loserChance) return index; // this index "won"
   double newLoserChance = 1.0 - (chances[index] / loserChance);
   return determineWin(index + 1, newLoserChance);
}

int result = determineWin(0, 0.75);

You can probably do that better but I'm hungry and can't think straight.

Watson

@apapadimoulis

I decided to re-run the numbers, and also try that idea I had. In hindsight it's clear that it wouldn't have worked out. But the original still holds up.

Key	Weight	Conditional probability
A	1	¹/₆₁
B	7	⁷/₆₀
C	3	³/₅₃
D	5	⁵/₅₀
E	9	⁹/₄₅
F	4	⁴/₃₆
G	7	⁷/₃₂
H	2	²/₂₅
I	9	⁹/₂₃
J	4	⁴/₁₄
K	0	⁰/₁₀
L	3	³/₁₀
M	7	⁷/₇

A million variates gave the histogram:

A	1.002535
B	7.013048
C	3.001993
D	4.990898
E	8.997195
F	3.998306
G	6.98389
H	1.990491
I	8.98286
J	4.01685
K	0
L	3.000651
M	7.021283

Close enough to the desired weights for my purposes (≪a million variates). Several such runs and the only (obvious) systematic bias that showed up was in how many times K would be selected (always exactly zero).

Jaloopa

Eric has you covered

ericlippert / Jan 31, 2019

Fixing Random, part 1

[Code is here.] How to say this delicately? I really, really dislike System.Random. Like, with a passion. It is so… awful. The C# design team tries hard to make the language a “pit of s…

ObjectMike

I did something like this in some university projects and a couple data generators I've written in the real world:

double r = rand()
if (r < 0.03) print("foo")
else if (r < 0.5) print("bar")
else print("baz")

dkf

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Is there no standard function/data structure for something like this?

The two options I've seen are either:

Have a sequence of value-probability pairs do basically what you've already done. You would need to decide if the probabilities in the sequence are cumulative or not, but otherwise it's pretty simple. Note that this is a sequence of pairs, not a dictionary; there's no uniqueness requirement.
Have a big list of all the options with each outcome put N times in the list according to its weight. Then you can use a simple pick-uniform-random-item algorithm.

Hybrid approaches are usable too, but more complex to configure.

Approach #1 uses more time per pick. Approach #2 uses more space except in the degenerate case of uniform probabilities for all items.

PleegWat

As mentioned, dictionary is the wrong datatype here. I'd use an array (of tuple<T, double>) and do a linear search. Switch to binary search if you have thousands of items, but you probably don't, and binary search can be tricky if you're not testing for direct equality (YMMV depending on the language you're working in).

Applied Mediocrity

@Jaloopa Fucking hell, that's one for the books. 40 blog posts about that one goddamn pony. I didn't think even Eric could do that.

First, I want to summarize, second I want to describe a whole lot of interesting stuff that I did not get to, and third, I want to give a selection of papers and further reading that inspired me in this series.

topspin

@Applied-Mediocrity said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

40 blog posts about that one goddamn pony

Huh, I've only made it to part 3 so far. Not sure I want to finish now.

Jaloopa

@Applied-Mediocrity his current series on Conway's game of life is at 36, but the next one is apparently going to be the last. I wasn't able to follow the random one. It quite quickly got beyond my statistical experience

error

@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@Applied-Mediocrity said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

40 blog posts about that one goddamn pony

Huh, I've only made it to part 3 so far. Not sure I want to finish now.

I'm going to add it to my list of Things I'm Not Actually Going To Read, but I Tell Myself I Will So I Can Feel Smart.

topspin

@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

I've only made it to part 3 so far

Speaking of: Not being fluent in C#, what is the point of this:

public sealed class Normal : IDistribution<double>
{
  // snip...
  public static Normal Distribution(
      double mean, double sigma) =>
    new Normal(mean, sigma);
  private Normal(double mean, double sigma)
  {
    this.Mean = mean;
    this.Sigma = sigma;
  }

Assuming Distribution is a static constructor-like function and not an instance variable, what is the point of the private constructor and calling Normal.Distribution(1.0, 1.5) instead of Normal(1.0, 1.5) (or renaming the class to make it NormalDistribution(1.0, 1.5))?

error

@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

I'm going to add it to my list of Things I'm Not Actually Going To Read, but I Tell Myself I Will So I Can Feel Smart.

I lied, by the way: this series is addictive.

(But actually what I did was employ reverse-psychology on my own self-sabotaging tendencies. )

error

@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Assuming Distribution is a static constructor-like function and not an instance variable, what is the point of the private constructor and calling Normal.Distribution(1.0, 1.5) instead of Normal(1.0, 1.5) (or renaming the class to make it NormalDistribution(1.0, 1.5))?

The other classes in this series seem to follow the pattern of a factory method that performs parameter validation and mapping. Here, the factory is just delegating to the constructor. Other classes do things like returning specialized subclasses. This is just the degenerate case.

I usually just put parameter validation inside the constructor.

error

It looks like Part 7 is where it starts getting relevant to OP:

ericlippert / Feb 21, 2019

Fixing Random, part 7

[Code is here.] A few episodes back I made a version of the Bernoulli distribution that took integer “odds”. That is, the distribution randomly produces either zero or one, and we can s…

topspin

@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:

private Singleton(T t) => this.t = t;

This seems no more readable than (assuming it's actually the same)

private Singleton(T t) { this.t = t; }

Benjamin Hall

@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:
private Singleton(T t) => this.t = t;
This seems no more readable than (assuming it's actually the same)
private Singleton(T t) { this.t = t; }

I know that it's something VS nags about on recent versions. Same with doing getters/setters for properties that way:

public int Foobar 
{
    get => _foobar;
    set => _foobar = value;
}

Not sure the benefit other than a few lines of brackets saved.

ObjectMike

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:
private Singleton(T t) => this.t = t;
This seems no more readable than (assuming it's actually the same)
private Singleton(T t) { this.t = t; }
I know that it's something VS nags about on recent versions. Same with doing getters/setters for properties that way:
public int Foobar 
{
    get => _foobar;
    set => _foobar = value;
}
Not sure the benefit other than a few lines of brackets saved.

Do you usually define trivial getters and setters? Or is that just for this example? If you usually do that, why not just do public int Foobar {get; set;}?

HardwareGeek

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Not sure the benefit other than a few lines of brackets saved.

Won't somebody think about the poor code monkeys paid by LOC?

No? I didn't think so.

Benjamin Hall

@mikehurley said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:
private Singleton(T t) => this.t = t;
This seems no more readable than (assuming it's actually the same)
private Singleton(T t) { this.t = t; }
I know that it's something VS nags about on recent versions. Same with doing getters/setters for properties that way:
public int Foobar 
{
    get => _foobar;
    set => _foobar = value;
}
Not sure the benefit other than a few lines of brackets saved.
Do you usually define trivial getters and setters? Or is that just for this example? If you usually do that, why not just do public int Foobar {get; set;}?

That was just that example. Although lots of UI patterns need getters and setters that aren't much more than that but can't be auto-properties. Sadly. It's one thing I'd love for them to provide some syntactic sugar around, something like public int Foobar {get; notify set;} or something.

ObjectMike

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@mikehurley said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:
private Singleton(T t) => this.t = t;
This seems no more readable than (assuming it's actually the same)
private Singleton(T t) { this.t = t; }
I know that it's something VS nags about on recent versions. Same with doing getters/setters for properties that way:
public int Foobar 
{
    get => _foobar;
    set => _foobar = value;
}
Not sure the benefit other than a few lines of brackets saved.
Do you usually define trivial getters and setters? Or is that just for this example? If you usually do that, why not just do public int Foobar {get; set;}?
That was just that example. Although lots of UI patterns need getters and setters that aren't much more than that but can't be auto-properties. Sadly. It's one thing I'd love for them to provide some syntactic sugar around, something like public int Foobar {get; notify set;} or something.

I think that'd be tricky since the notify stuff (at least from what I remember of WPF) is a library feature not a language one like events are. Isn't there a "Property" type of some kind that the databinding stuff recognizes and doesn't need you to fire the property changed event. That's about the limit of what I remember.

error

@Jaloopa said in There's got to be a better way of doing a random choice from a weighted set

ericlippert / Jan 31, 2019

Fixing Random, part 1

[Code is here.] How to say this delicately? I really, really dislike System.Random. Like, with a passion. It is so… awful. The C# design team tries hard to make the language a “pit of s…

It's weird that this series is framed as "we sure botched the standard library for random in .NET; here's how you can fix it yourself..." and not "here's my new random library on Nuget, it works like this..."

Benjamin Hall

@mikehurley said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@mikehurley said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@Benjamin-Hall said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@topspin said in There's got to be a better way of doing a random choice from a weighted set (C#)...:
@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

Other classes do things like returning specialized subclasses

Yeah, the factory for discrete Bernoulli actually returns instances of the Singleton distribution class if possible. Clever. Not sure I like it.

Might be lack of familiarity but I'm starting to feel he's overusing the lambda operator or whatever it is:
private Singleton(T t) => this.t = t;
This seems no more readable than (assuming it's actually the same)
private Singleton(T t) { this.t = t; }
I know that it's something VS nags about on recent versions. Same with doing getters/setters for properties that way:
public int Foobar 
{
    get => _foobar;
    set => _foobar = value;
}
Not sure the benefit other than a few lines of brackets saved.
Do you usually define trivial getters and setters? Or is that just for this example? If you usually do that, why not just do public int Foobar {get; set;}?
That was just that example. Although lots of UI patterns need getters and setters that aren't much more than that but can't be auto-properties. Sadly. It's one thing I'd love for them to provide some syntactic sugar around, something like public int Foobar {get; notify set;} or something.
I think that'd be tricky since the notify stuff (at least from what I remember of WPF) is a library feature not a language one like events are. Isn't there a "Property" type of some kind that the databinding stuff recognizes and doesn't need you to fire the property changed event. That's about the limit of what I remember.

In theory it's a library thing. But there's a built-in interface INotifyPropertyChanged that everything and their mother hooks into for that. But you have to write your own event handlers. Which everyone does by either loading a framework (there are 2 big ones out there) or copying their implementations verbatim. They should move that into the standard library and be done with it, instead of having a bunch of 99.99999% identical implementations. Those that want versions can do it the same way.

Jaloopa

@error said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

@Jaloopa said in There's got to be a better way of doing a random choice from a weighted set

ericlippert / Jan 31, 2019

Fixing Random, part 1

[Code is here.] How to say this delicately? I really, really dislike System.Random. Like, with a passion. It is so… awful. The C# design team tries hard to make the language a “pit of s…

It's weird that this series is framed as "we sure botched the standard library for random in .NET; here's how you can fix it yourself..." and not "here's my new random library on Nuget, it works like this..."

A recent post is him introducing Bean Machine, a python based DSL using a lot of the concepts from this series

loopback0

@Jaloopa said in There's got to be a better way of doing a random choice from a weighted set (C#)...:

A recent post is him introducing Bean Machine

sockpuppet7

@Benjamin-Hall this thing seems very similar to this one:

Fitness proportionate selection - Wikipedia