# Distributing pupils onto lectures

• Every year we have a day for our pupils where there are numerous lectures / discussion groups by various employers/educators (e.g. Metro - a huge wholesaler, the local universities, the police, several logistics companies).

Our pupils then can choose which event to attend. They have three primary choices and two secondary ones.

Of the groups holding the events there were 19 last year (this year should be similar).

There are three "slots" for each event (i.e. 08:00h to 09:00h, 09:30h to 10:30h and 11:00h to 12:00h). Not all groups offer an event for all the slots - some of them only for the first, some for second and third, some for all of them.

Each slot is allowed a maximum of 30 participants. There were 256 pupils last year.

The question now becomes: How do I distribute the pupils at least somewhat fairly onto the events?

I mean, sure, there's always brute force but given the number of possible combinations...

• @rhywden Does your school already have a software to create time tables for teachers/students at the start of year? This might be formulated as a similar problem?

Like, each employer is a teacher, who has 1/2/3 periods of teaching available, of a topic of which he is the only teacher (there are normally several math teachers, but here only "employer A" can teach "employer A"), and the pupils' wishes are similar to the number of hours of various topics that they must have (e.g. instead of saying that a pupil must have 3 hours of maths, he now must have 1 hour of "employer A")? If the software is properly designed to handle a wide variety of options for students, including potentially handling conflicts (e.g. not being able to honour all options for all students), this might not be so much out of its original scope...

(for the lulz:

There were 256 pupils last year.

First, remove one pupil to fit them onto 8 bits...)

Otherwise, I don't have your answer. I mean, I would start some kind of heuristic like take the employers with the less available periods and fill them in first (they are the less flexible) but that'll probably quickly go down into brute force.

Oh, hang on, I'd probably start by looking at the data: if you count the number of students who chose each employer vs. the number of available slots for that employer, how much over-crowding do you have? If none or very little, then this isn't a guarantee that things will work out (such as two employers that are present only once and at the same time), but my gut feeling would be that a brute force approach + tiny bit of manual fiddling might be enough?

• I once worked at a steel factory where they used a simplex algorithm to create the optimal recipe for steel of a requested quality. I didn't implement it (so I can't give much practical advice), but your problem statement looks like it could match.

• Oh, hang on, I'd probably start by looking at the data: if you count the number of students who chose each employer vs. the number of available slots for that employer, how much over-crowding do you have? If none or very little, then this isn't a guarantee that things will work out (such as two employers that are present only once and at the same time), but my gut feeling would be that a brute force approach + tiny bit of manual fiddling might be enough?

Yeah, good idea that. I'll probably try to distribute the pupils onto the "non-contested" events first (i.e. those where there are less pupils than places). Then a distribution for the "contested" primary choices. And after that the secondary choices for the ones who don't have yet three events.

And regarding our software: While we have one of those, it's not exactly practical for this purpose.

• Yeah, good idea that. I'll probably try to distribute the pupils onto the "non-contested" events first (i.e. those where there are less pupils than places). Then a distribution for the "contested" primary choices. And after that the secondary choices for the ones who don't have yet three events.

If there isn't too much overlap, I guess this should more or less work. At the end you might need to fuddle a bit manually with the results (because after all, there is no guarantee that there is a solution to your problem!). Also, I'm guessing that you will probably need some manual fiddling anyway because out of your 256 students there will necessarily be at least one who changes his mind (and for good enough reasons that you won't be able to just tell him to fuck off), or some other last minute adjustment.

And regarding our software: While we have one of those, it's not exactly practical for this purpose.

I'm not really surprised by that. From what I've heard here, these kind of softwares, when they exist (and are used), only really give a first draft that is always in the end heavily twisted and edited by the assistant principals (that seems to be their main job... probably so that the principal can wash his hands of all the errors saying that he delegated to the assistant, who's usually a relatively junior person), and it takes a couple of days/weeks after the start of class to iron out the last wrinkles (two classes in the same room at the same time, or a lab session that is in a non-lab room).

In other words, they are probably just thinly veiled Excel spreadsheets copying what was done the year before and not really full multi-condition software with a nice underlying mathematical model!

• I'm not really surprised by that. From what I've heard here, these kind of softwares, when they exist (and are used), only really give a first draft that is always in the end heavily twisted and edited by the assistant principals (that seems to be their main job... probably so that the principal can wash his hands of all the errors saying that he delegated to the assistant, who's usually a relatively junior person), and it takes a couple of days/weeks after the start of class to iron out the last wrinkles (two classes in the same room at the same time, or a lab session that is in a non-lab room).

In other words, they are probably just thinly veiled Excel spreadsheets copying what was done the year before and not really full multi-condition software with a nice underlying mathematical model!

Well, over here it's actually taken a bit more seriously - the ones doing the time tables are always senior staff and the software seems to be good enough to actually not require that much fiddling. For instance, this year I haven't heard of anyone's timetable changing in a major way and even minor changes are rare.

And we're not a small school - 1500 pupils with a staff of 130 teachers.

I still remember the tales of our vice-principal who had to do that stuff completely manually. It usually required a week's worth of effort, a lot of swearing and a full bottle of whiskey.

• Well, over here it's actually taken a bit more seriously - the ones doing the time tables are always senior staff and the software seems to be good enough to actually not require that much fiddling. For instance, this year I haven't heard of anyone's timetable changing in a major way and even minor changes are rare.

I think that part of the changes that are required are due to weird constraints that either can't really be expressed inside the software, or that can't really be officially accepted as constraints but that some teachers manage to sneak in afterwards (e.g. "if I don't have any courses on Mondays then I can get away for a long week-end").

But, if stereotypes are even a tiny bit true, cultural differences between our countries probably explain why this is less (or not at all!) an issue for you

• Do whatever you did last year, since it's obvious you've done it before. Or at least tell us what was wrong with that that you're trying to fix.

• I'm not really surprised by that. From what I've heard here, these kind of softwares, when they exist (and are used), only really give a first draft that is always in the end heavily twisted and edited by the assistant principals (that seems to be their main job... probably so that the principal can wash his hands of all the errors saying that he delegated to the assistant, who's usually a relatively junior person), and it takes a couple of days/weeks after the start of class to iron out the last wrinkles (two classes in the same room at the same time, or a lab session that is in a non-lab room).

In other words, they are probably just thinly veiled Excel spreadsheets copying what was done the year before and not really full multi-condition software with a nice underlying mathematical model!

Well, over here it's actually taken a bit more seriously - the ones doing the time tables are always senior staff and the software seems to be good enough to actually not require that much fiddling. For instance, this year I haven't heard of anyone's timetable changing in a major way and even minor changes are rare.

And we're not a small school - 1500 pupils with a staff of 130 teachers.

I still remember the tales of our vice-principal who had to do that stuff completely manually. It usually required a week's worth of effort, a lot of swearing and a full bottle of whiskey.

At my school (~700 students over 7 grades, about 50ish faculty), it's one of the major jobs of the Dean of Students (one of the senior admin team). The software gives a good first pass, but there are always students who want to take classes that conflict, or other screwy constraints (teachers who teach both upper and middle classes which don't have the same schedules, AP lab sciences that take 2 periods and only have one section, students in Bowling who want study hall 7th period because otherwise they'd miss ~60% of 7th period classes, etc).

And then there are the complaints and requests for changes from parents and students. It's a pareto optimal problem--there comes a point at which any further change would hurt more people than it would help, but nobody wants to be that last person who doesn't get their slot.

It's one of many admin jobs I'm very grateful isn't in my job description. It's a completely thankless job.

• upper and middle classes which don't have the same schedules

Posh kids have an extra hour for Polo classes?

• Finally a chance for Prolog to shine.

• upper and middle classes which don't have the same schedules

Posh kids have an extra hour for Polo classes?

Not quite--middle school (grades 6, 7, and have different lunch than upper (high) school and have mandatory sports 7th period while upper has class.

Our schedule is screwy for many other reasons--there isn't a normal week in October, but that's a separate issue.

• The question now becomes: How do I distribute the pupils at least somewhat fairly onto the events?

Queue? Initialize with a random shuffle, choose most limited available event from primary, then secondary choices, move to back of queue.

Granted, you'll probably end up with some conflicts, so will most other non-knapsack problem solutions

• I still remember the tales of our vice-principal who had to do that stuff completely manually. It usually required a week's worth of effort, a lot of swearing and a full bottle of whiskey.

Used to be our physic's teacher, and he took a lot longer than a week. (7 years (11-18 yr olds), generally 6 classes in each year, 8 periods per day.)

• Do whatever you did last year, since it's obvious you've done it before. Or at least tell us what was wrong with that that you're trying to fix.

Last year was a big-ass Excel sheet, a flask of whiskey and much grumbling. I.e. completely manual.

• Here's an algorithm I haven't tested or written non-pseudocode for, but it's worth a try:

Shuffle the primary choices.
Shuffle the secondary choices.
Shuffle the student list.

Go through the list and give each student the first choice in their primary choice list if possible, or the second, or the third, or the first secondary choice, or the second secondary choice, in that order.
Remove the choice that each person was selected for from their list and repeat as many times as needed.

• @ben_lubar Also maintain a list of failure cases (e.g., student/primary choices that weren't picked). Give it an once-over to see if somebody will be screwed over totally.

(This sounds oddly familiar... Oh, yeah. Scheduling conference speakers. Excel+a few Lua scripts+lots of swearing.)

• Last year was a big-ass Excel sheet, a flask of whiskey and much grumbling. I.e. completely manual.

That answers half my question.

EDIT: Sorry I'm annoyed by all these questions that ask for advice but don't provide enough information to actually give the advice. What exactly was wrong with the Excel method that you want to improve on? I mean, I can ass-pull some assumptions but I'd rather actually know the answer.

• Last year was a big-ass Excel sheet, a flask of whiskey and much grumbling. I.e. completely manual.

That answers half my question.

EDIT: Sorry I'm annoyed by all these questions that ask for advice but don't provide enough information to actually give the advice. What exactly was wrong with the Excel method that you want to improve on? I mean, I can ass-pull some assumptions but I'd rather actually know the answer.

That it was completely manual? Have you ever tried distributing 250+ pupils onto 18+ events with side conditions? And just to make it crystal-clear: No, we cannot "copy over" anything from last year's run.

Strangely enough, the other people in this thread seem to have understood why someone might not want to do that on a regular basis.

• @rhywden

Good insightful post there. +1

• That it was completely manual?

Ok... but why is that not ideal?

It takes too long? There were too many mistakes? What algorithm did you use to do it manually, and why can't you automate it, especially if you're using Excel with has more than enough scripting competence to handle a problem like this?

Have you ever tried distributing 250+ pupils onto 18+ events with side conditions?

Nope.

And just to make it crystal-clear: No, we cannot "copy over" anything from last year's run.

Not even the Excel schema? Why not? Was it a different problem last year?

Strangely enough, the other people in this thread seem to have understood why someone might not want to do that on a regular basis.

I think they just looked at it and say "oh fun computer computer computer code code code" without taking the holistic view I take.

(Also how often is "a regular basis"? I thought this was yearly.)

For example, if you did it manually with some whiskey and it took 5 hours, but building a program/script to do it automatically would also take 5 hours, do you really save anything? (Perhaps the answer is yes, I don't know.) There are a lot of problems I can solve manually far quicker than going to the IDE and writing out some code, this might be one of those.

Or, if you did it manually in Excel and it took 5 hours, but re-arranging some of the cells and writing a few formulas in others would have saved you 3 of those hours, wouldn't that be a worthwhile solution to investigate?

The goal here is to solve the problem, not to write computer code. That's why I'm trying to gather enough information to help you solve the problem. If you don't want my help, then fine. I apologize for trying to understand the problem you're trying to solve.

• @blakeyrat I'm interested in what Excel formulas are capable of dealing with a variant of a Travelling Salesman problem.

• @rhywden It's exactly as good at it as any other programming environment, just with crappier syntax.

I'm thinking through the problem right now. Stay tuned. Unless my boss calls me up and assigns me something.

• How many events is each student supposed to attend? Do we have to slot them into 3 events each (the maximum)? Are 2, 1 or 0 events acceptable schedules for unlucky students? If we can't fit someone into any of his choices, do we leave them out of the event, or fit them into a completely unrelated event?

• How many events is each student supposed to attend? Do we have to slot them into 3 events each (the maximum)? Are 2, 1 or 0 events acceptable schedules for unlucky students? If we can't fit someone into any of his choices, do we leave them out of the event, or fit them into a completely unrelated event?

As I said, three primary choices, as in: Three events are mandatory. If one or two of their primary choices cannot be fulfilled then the secondary choices come into play. And if it simply doesn't work any other way then it is possible to have only two events. No shoving into unwanted events, though.

• @kian To give you an impression of what my colleague did last year:

From row 9 onwards the pupils can be found. An "x" represents a primary choice, a "y" a secondary one. Column F to X denotate the events ("Brillux" to "Polizei") and the rows below that represent how many slots exist for a given event and when. Example: "KSP" has "a a a" below it, which means that it has all three slots available. Meanwhile, "B&P" only provides for the first two slots and "Metro" only has one slot at the very beginning.

The upper limit of attendees for a slot is 30.

The result then looked like that:

A red "x" means an unfulfilled primary choice.

• sounds like a transport problem.

sadly, i don't remember enough to be useful. maybe someone else can give you a hint about how'd you solve it.

EDIT: maybe more like an https://en.wikipedia.org/wiki/Assignment_problem

• @Rhywden Here are a couple useful prioritization questions, I think:

• Would you prefer that anything without contention (i.e., `students.PrimaryChoiceIs(x) <= x.Slots`) is filled, or would you sacrifice that to make other things possibly work better?
• Would you prefer to have every student in as many slots as possible, even if this requires some of them to have more secondary choices than they would if you didn't care if each student was in the most possible slots? (I.e., max slots filled vs max primaries)

Edit: More generally, what things can be considered "fixed" once they're achieved? For example, flood filling slots without contention is an easy first step, but can that be considered "fixed" once done (aside from shifting around within timeslots of the same choice to accommodate later steps), or should it be considered available to be changed until some later step is done (or everything is done)?

• @dreikin Yeah, those occured to me as well:

I'll probably go for: Fill uncontested slots. Try to achieve as many slots as possible even if that means more secondary choices.

Just remembered: I'll probably have to set a lower boundary as well - i.e. not less than 5 pupils for a given slot or something. Don't know yet, though, have to ask.

• I mean, sure, there's always brute force but given the number of possible combinations...

Metaheuristics, then? You're probably not going to find a perfect solution, but the problem seems to yield itself fine to some sort of randomized optimization.

• @rhywden I think I'd split the problem in two parts. First, attempt to fit students into slots. One possibility would be:

``````// Shuffle students, shuffle primary and secondary preferences, attempt to fit student into any of their primary events, in a random time.
//   If successful move student to low priority list
//   If failed move student to high priority list
// After going through the list, shuffle high priority list and attempt to fit students there into a secondary event in a random time.
//   If successful and both secondaries met remove student.
//   If failed remove student from list.
// Re-run high priority list until empty.
// Run through low priority list, attempting to fit students into either primary or secondary events.
//   If successful and three choices met remove student.
//   If failed, remove student from list.
// Re-run low priority list until empty.
``````

This prioritizes trying to get students into at least one favored event.
Next, a sort of negotiating step where you try to trade slots between students who got lucky and students that didn't:

``````// Sort students in ascending order by score. Primary slots are worth 2, secondary slots are worth 1, empty slots are worth 0. If minimum score is 4 we're done.
// Take student from the bottom, look for a student from the top attending an event the bottom student wants and with a preference for an event with vacancies.
//   If you find one, move the top student from the bottom's preference to the vacancy, and give the new spot to the bottom student.
//   If we fail, select all the students attending a wanted slot by the bottom student, sort them, pick one with the highest score
//     Look for a student attending an event the top student has a preference for and who has a preference for an event with a vacancy
//       If we succeed, give the top student's slot to the bottom student, give the found student slot to top student, and give the vacancy to the found student.
//       If we fail, try with the other preferences
//         If we're at the last preference, try with a different top student
//           If there are no remaining students to shuffle with, increase the depth of the trading chain
//             If the depth of the chain reaches a given max, skip the bottom student.
// Repeat until either minimum or maximum score is 4 or max number iterations.
``````

There's no guarantee that your students' preferences and the availability of events actually result in a distribution that gets everyone into 3 slots they chose, so any solution you come up with has to do a "best effort" and figure a reasonable cutoff point in trying to improve the result.

• I got this far until Excel began complaining about circular references which, to be fair, it's 100% right. You really need to attach the code to a button or something so Excel doesn't try to real-time it, but also I had a 1:1 with my boss and now I'm too busy to continue working on it.

• @blakeyrat Will have a look, thanks!

• @jaloopa my score says 0!

I don't remember pressing any buttons to reply. I think i was going to talk about Arrow's theorem.

• So, I have a working implementation. Basically, any iteration does two passes: First it tries to fulfill the primary wishes, the second pass then goes over everyone not yet with three events to attend and tries to fulfill the secondary wishes.

I then went for a two phase calculation: First phase does about 100,000 random shuffles, second phase takes the best result from the first phase and then gos through every possible iteration (which, of course, is cancellable due to the fact that the factorial of 258 is a bit of a largish number).

It seems, however, that the random walk will usually yield an optimal (or at least near-optimal) result because several tries have yielded the same score - for the current set of data, I get 88.49% of pupils have three events fulfilled. A 100% wouldn't be possible anyway because some pupils only have two wishes.

It also showed the speed differential between debug and release build - debug takes 48 seconds for 100,000 permutations. Release only takes 17. This is single-threaded, by the way. Considering that I reached the optimum rather quickly, I don't consider it worth the hassle to add multi-threading.

• I once worked at a steel factory where they used a simplex algorithm to create the optimal recipe for steel of a requested quality. I didn't implement it (so I can't give much practical advice), but your problem statement looks like it could match.