Theoretical musings on language design

masonwheeler

With Python generators, the call stack is essentially a stack of frame objects, which are just another type of Python objects. Code can access them and play around with them if you know the right arcana to invoke. This makes coroutines (ie. generators) very simple to implement: when you hit a yield, just store the current frame somewhere and then come back to it.

In compiled languages, things are a lot trickier, because the stack frame is a CPU stack frame. Implementing coroutines generally involves a bunch of convoluted work to transform the method into a state machine. (See generators and async/await in C# for examples.) This adds a lot of overhead, and makes the code harder to debug.

That's got me wondering if it would be possible to build a compiler that can do coroutines the Python way: take the CPU stack frame and store it as a field on the current object, which is reloaded each time you re-enter. That would probably be simpler to implement a compiler for than doing all the state machine transformations, and it would definitely be more efficient at runtime without all the state machine overhead!

Anyone have any examples where this has already been done, or explanations as to why this idea is a massive ?

wharrgarbl

@masonwheeler One of the answers on stackoverflow claims that boost coroutine2 thing does it

Equivalent C++ to Python generator pattern

I've got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I

wharrgarbl

@masonwheeler wikipedia list some implementations

Coroutine - Wikipedia

Generator (computer programming) - Wikipedia

Generators in C++

cartman82

In javascript, they break each function into a huge switch statement, where each line of code is one case statement (at least in the babel implementation).

Would that fit python or compiled model?

masonwheeler

@cartman82 That sounds like the "worst of both worlds" model! x.x

Dreikin

@masonwheeler said in Theoretical musings on language design:

This adds a lot of overhead

To the compilation or to the compiled executable?

dkf

@masonwheeler said in Theoretical musings on language design:

In compiled languages, things are a lot trickier, because the stack frame is a CPU stack frame.

Yes, but you've got a lot more opportunity to analyse things. For example, you know on entry that you potentially need a saveable “stack frame”, and in fact you don't use a real stack frame for variables that need to be live over the yield, but rather a malloc()ed structure (or whatever the runtime calls it). The real stack then is just used to hold variables that aren't live over the yield and, of course, the reference to the malloced structure. What's more, the compiler should have a very good idea what variables are live across the yield. Now, where there are several yields with different sets of live variables, there's a decision to make whether each ought to use a different structure, or whether there should be a single context structure that holds the union; I suspect the latter will be better in most common cases, but I've not analysed my hunch. ;) I believe it would be possible to avoid the malloc() overhead in some cases, e.g., if there is one or more code paths that definitely never yield (argument precondition checks are a good example of this) but again this is mere supposition.

(Context: I'm working with a friend on a compiler for a language that will do this sort of analysis, but we're working on a different area and probably won't get to the yield handling this year. Unless it starts raining a lot in the Catskills and my friend is forced to stay off the mountains more weekends. )

masonwheeler

@dreikin both! :P

PleegWat

@dkf The saved structure is what I tend to call a context. If you're hiding that structure from the calling code, then it will need to be allocated by the initial function call which initializes the generator. If the size and members of the context are in a public header, you can just allocate it on the 'real' stack.

I don't think I'd use a union for the context type. Either I'm calling a specific generator and I use its context, or I don't know in advance and I'd probably just use a void pointer. I doubt the machine code differs between a union and a void pointer. I've worked something like this once, without yield-like shenanigans, and I think I used void pointers there because for one of the generators the only context needed was FILE *.

masonwheeler

@pleegwat said in Theoretical musings on language design:

The saved structure is what I tend to call a context.

Not just you. I did a bit of research, and routines for context manipulation, which were called "context" routines, were a part of the POSIX spec, but since a lot of devs couldn't be bothered to implement them correctly, (or at all in some cases,) they removed them about 10 years ago.

ben_lubar

Here's a super simple implementation of Go's concurrency system: https://github.com/BenLubar/coolc/blob/master/coroutine.cool.go

dkf

@pleegwat said in Theoretical musings on language design:

If you're hiding that structure from the calling code, then it will need to be allocated by the initial function call which initializes the generator. If the size and members of the context are in a public header, you can just allocate it on the 'real' stack.

The whole point is that you don't allocate it on the stack. If you were able to do that, it'd just be a stack frame (or part of one). Once you start doing yield, i.e., allowing a single function to return multiple times, the standard stack stops being nearly so helpful. (There are languages which try to work out exact lifespans for these sorts of things that relate to actual particular stack frames — well, probably only Rust really goes for that level of anal-retentiveness — but most language designers know when to stop working so hard.) The actual contents of these contexts/frames are only important to the code that reaches into them; everyone else needs only an opaque handle (i.e., probably a pointer but don't peek!) and as long as they can just pass it around correctly, that's fine. Remember though, the code that generates and manipulates these things isn't really all that much like the sort of code that you write; this is all machine-authored stuff and the results can look quite weird when you disassemble it.

Compilers really do some clever stuff under the covers.

PleegWat

@dkf I do this kind of stuff in C - I'll have a callback with accompanying context structure, and I'll know the lifetime is local so I keep the whole thing on the stack. In C# and similar the generator will be an object, and I do believe the C# compiler does analysis to detect if it can keep object data on the stack. If you're not passing the generator out anywhere, it will qualify.

cvi

@masonwheeler said in Theoretical musings on language design:

In compiled languages, things are a lot trickier, because the stack frame is a CPU stack frame.

I've seen people implement this with "microthreads" or "fibers". Basically, you create one or more additional stacks, and switch over to those in your yield/await (etc) methods. Switching stacks (in user-space) is only a few instructions, so it's actually fairly efficient. You also don't really need support from the compiler, assuming you know the calling conventions (so what registers you need to preserve and so on), as the yield (etc) looks just like a normal function call.

On the downside, the implementations are a bit fragile and non-portable. I'm also not sure how a microthread interacts with exceptions these days. Used to be a bit dicey when I experimented with them.

cvi

@cvi Wanted to edit in that a microthread/fiber essentially is a user-space cooperative thread, but nodeBB kept throwing an error.

dkf

@pleegwat said in Theoretical musings on language design:

I do this kind of stuff in C - I'll have a callback with accompanying context structure, and I'll know the lifetime is local so I keep the whole thing on the stack.

In many ways, some of the most interesting applications of these things do the opposite, moving things off the stack onto the heap. This is of interest because the heap is a lot larger than the stack, especially in multithreaded applications, and it therefore enables applications that use heavily recursive algorithms. When you're using the stack as your memory manager, you've got some critical tuning parameters to worry about that it is very difficult to work around (the usual key one being the size of each thread's stack). Alas, it turns out that those parameters are exactly the sort of thing that it is very difficult for a compiler to analyse as they depend on the state of the whole program, and it's that which a compiler usually tries to avoid looking at (because it is where the demons of the halting problem and it's more-evil relatives lurk).

Until one has written a production-grade language implementation, there's a lot of details in there where the trade-offs of why some things are done that way are really not very obvious. I guess it's like the reputed complexity of running a large ISP that way…

dkf

@cvi said in Theoretical musings on language design:

I'm also not sure how a microthread interacts with exceptions these days. Used to be a bit dicey when I experimented with them.

C++ doesn't have a reputation for liking exceptions that get thrown on multiple threads. Other languages use different implementations that are more stable in that regard (in part by having different memory management constraints and so on).

PleegWat

@cvi I don't think there's much need to switch stacks though. Not in case of yield, where there is only one function whose state needs to be saved. In that case, saving (part of) the local variables and the last place yielded from suffices.

I've contemplated switching stacks in one case where the issue was the other way around - a yield would have been required when new input was needed, which would potentially have been several function calls deeper in. That was a thought experiment though - I had existing code which was riddled with state machines and state flags instead. And I do not think allocating tens of thousands of stacks would have been effective.

PleegWat

@dkf said in Theoretical musings on language design:

In many ways, some of the most interesting applications of these things do the opposite, moving things off the stack onto the heap. This is of interest because the heap is a lot larger than the stack, especially in multithreaded applications, and it therefore enables applications that use heavily recursive algorithms.

Ah, I could see that. I tend not to have problems that suit recursion very often.

dkf

@pleegwat said in Theoretical musings on language design:

I've contemplated switching stacks in one case where the issue was the other way around - a yield would have been required when new input was needed, which would potentially have been several function calls deeper in. That was a thought experiment though - I had existing code which was riddled with state machines and state flags instead. And I do not think allocating tens of thousands of stacks would have been effective.

Experience suggests otherwise. There's quite a few problems where being able to yield from places other than the topmost function makes things a whole lot simpler. Languages without the capacity to make those micro-stacks (or equivalent) end up needing a complex and messy stack of yields/awaits. The people who write those things tend to say “Yay! Look at what we can do!” and while it is much simpler than the equivalents with state machines (which tend to be deep into greybeard gnosticism territory) they's pretty horrible when compared with what the more sophisticated approaches can do.

The more languages you know, the more you can see the warts and infelicities.

cvi

@dkf said in Theoretical musings on language design:

C++ doesn't have a reputation for liking exceptions that get thrown on multiple threads. Other languages use different implementations that are more stable in that regard (in part by having different memory management constraints and so on).

Pre C++11 the standard didn't really acknowledge multi-threading that much, so that was part of the problem. IME if you made sure to catch all exceptions in the top-level frame of each thread, things were OK.

C++11 and forward things look better, with multi-threading making its way into the standard. You also get tools to transport exceptions between threads. In some cases this is even automatic (so, a future may throw when it's accessed if the thread producing the future threw and so on).

For micro-threads, it's probably still a bit dicey, at least if you implement them on your own via assembler (but then again, each time you implement something on your own in assembler, you're entering "it's-probably-dicey" land).

cvi

@pleegwat said in Theoretical musings on language design:

I don't think there's much need to switch stacks though. Not in case of yield, where there is only one function whose state needs to be saved. In that case, saving (part of) the local variables and the last place yielded from suffices.

I think it could be neat to be able to yield from further down the call stack. As a simple example, you if you want to concatenate the output from two generators, you just write a function that calls the first generator (which internally yields) until it's done, and then the other one.

Edit: Then there's what @dkf says. I've not used micothreads in that way really, so maybe listen to him. ;-)

PleegWat

@dkf I'm having trouble reading your stance on the matter? I've written a macro-powered prototype of yield in C for fun once - Took 3 or 4 rather simple macros, if I remember correctly. Use switch to jump to the appropriate location so it could not jump into switch statements. I haven't run into a situation where I'd actually consider using the thing.

I never played with switching stacks. Not sure I'd deploy that in production. Meanwhile what I've got is a few tens of thousands of lines of flag- and enum-based state machines which I'm not fond of but am not going to be rewriting any time soon either - it's just not big enough a problem when observed from the outside.

Dreikin

@dkf said in Theoretical musings on language design:

being able to yield from places other than the topmost function makes things a whole lot simpler.

JavaScript yield*?

Languages without the capacity to make those micro-stacks (or equivalent) end up needing a complex and messy stack of yields/awaits.

C#?

0_1502048782436_8b9961d1-9289-4d98-972c-95a775cbaebf-image.png
Ugh. The old one looks a lot nicer to me.

PleegWat

@dreikin Conceptually it would be like having a producer and a consumer running in separate threads communicating via a queue. But instead of them running in parallel, they alternate - each time the producer generates a value control switches to the consumer, and each time the consumer needs a new value control switches to the producer. But unlike a setup where one of these calls the other (and so the other always starts at the same entry point), they each have their own stack and both can resume at arbitrary points.

In yield or even yield*, you don't quite have that since all intermediate functions have to be yield-aware. You still can't pass a callback into a library function and yield from the callback.

Dreikin

@cvi said in Theoretical musings on language design:

@pleegwat said in Theoretical musings on language design:

I don't think there's much need to switch stacks though. Not in case of yield, where there is only one function whose state needs to be saved. In that case, saving (part of) the local variables and the last place yielded from suffices.

I think it could be neat to be able to yield from further down the call stack. As a simple example, you if you want to concatenate the output from two generators, you just write a function that calls the first generator (which internally yields) until it's done, and then the other one.

That's what JavaScript's yield* does. Example from MDN:

function* g1() {
  yield 2;
  yield 3;
  yield 4;
}

function* g2() {
  yield 1;
  yield* g1();
  yield 5;
}

var iterator = g2();

console.log(iterator.next()); // {value: 1, done: false}
console.log(iterator.next()); // {value: 2, done: false}
console.log(iterator.next()); // {value: 3, done: false}
console.log(iterator.next()); // {value: 4, done: false}
console.log(iterator.next()); // {value: 5, done: false}
console.log(iterator.next()); // {value: undefined, done: true}

dkf

@pleegwat said in Theoretical musings on language design:

@dkf I'm having trouble reading your stance on the matter? I've written a macro-powered prototype of yield in C for fun once - Took 3 or 4 rather simple macros, if I remember correctly. Use switch to jump to the appropriate location so it could not jump into switch statements. I haven't run into a situation where I'd actually consider using the thing.

That's how you do it in C, but C really isn't a good place to start. Higher level languages have the advantage on you in that they've got more information available to the compiler and can do a better job of hiding the awful bits. With a higher-level language, you can do it by creating a trampoline that tail-calls to the implementation function for the section that should be called in this particular state point (which is something you'd encode in the state handle that would be a mandatory argument). The compiler then just needs to convert the code that you called into a collection of functions (plus that trampoline) that describe how the code executes between the yields. I believe it to be relatively simple for a compiler, and I'm absolutely certain that it is a horrible transformation to do by hand; continuation-passing-style code is something I've written a fair bit of, and really don't want to do again.

Stack switching is one of these things that is, again, nasty and tricky in C. In fact, I'm not quite sure how setcontext actually works, but I suspect it involves diddling the return information on the stack, as then at least it won't require turning off interrupts to be safe. (Well, safe-ish. This whole area is pretty loopy in the first place.) Leave that sort of thing to threading libraries and other truly brave individuals. Higher level languages that do magic in this area tend to instead move things to the heap as that's merely somewhat time-costly; it doesn't drive the implementer mad as well…

Well, not as much mad anyway. Those nice men with the jacket with long arms and padded cell tell me this.

PleegWat

@dreikin But can you do this?

function* g1() {
    yield 2;
    yield 3;
    yield 4;
}

function unrelated(cb) {
    /* unrelated code*/
    cb();
}

function* g2() {
    yield 1;
    yield* unrelated(g1);
    yield 5;
}

dkf

@pleegwat said in Theoretical musings on language design:

But can you do this?

That depends on the language implementation. Some can, but it isn't a particularly common capability.

Dreikin

@pleegwat said in Theoretical musings on language design:

@dreikin But can you do this?

function* g1() {
    yield 2;
    yield 3;
    yield 4;
}

function unrelated(cb) {
    /* unrelated code*/
    cb();
}

function* g2() {
    yield 1;
    yield* unrelated(g1);
    yield 5;
}

I'm...not quite sure how you would expect that to work from the language's standpoint. It seems ambiguous: did you really want just the one value (what you ought to get), or did you want the full yielded sequence? It seems to be a matter of defaults: the current setup gives you the single result unless you specifically ask for the full sequence, but it sounds like you want it the other way around?

ETA: actually, you didn't put a return statement there. Was that an accidental omission was it intentional?

Dreikin

@dreikin said in Theoretical musings on language design:

Was that an accidental omission was it intentional?

Was that an accidental omission or was it intentional?

Dagnabbit, , let me edit stuff!

PleegWat

@dreikin I'm illustrating a limitation. You can't yield from g1 into g2, or the original caller, because unrelated is unaware there's any yielding going on. unrelated is simply a normal function from a different library which does something, calls the callback zero or more times, and returns a normal value. g2 calls unrelated with g1 as an argument, expecting g1 to yield values to g2 without unrelated interfering.

dkf

@dreikin said in Theoretical musings on language design:

I'm...not quite sure how you would expect that to work from the language's standpoint.

The key is whether there's been a micro-stack introduced or not. Without one, the call from unrelated to g1 has to make a new yielding context and therefore there'll only be a single value produced. With a micro-stack introduced by g2 (but I don't know how “introduce a new micro-stack” is indicated in this particular syntax; see next sentence) then it becomes possible for that code to work. If function* means “make a new micro-stack when you call this” then the code won't work because the call from unrelated to its callback will result in another micro-stack being made.

Micro-stacks are non-trivial objects, BTW… and there's a big split between programmers who originally approached this all as a means of doing generators and are trying to build out from there, and programmers who approached this as a general micro-stack mechanism and who specialised to generators afterwards. The latter have quite a different set of underlying semantics.

cvi

@dkf said in Theoretical musings on language design:

Stack switching is one of these things that is, again, nasty and tricky in C. In fact, I'm not quite sure how setcontext actually works, but I suspect it involves diddling the return information on the stack, as then at least it won't require turning off interrupts to be safe.

Conceptually, it's not that difficult. Setting up and switching to a microthread consists of the following steps:

allocate memory for the new stack (malloc() or whatever)
initialize the memory with an initial stack frame. This is a bit system dependent (calling conventions and so on). What I've seen is that you set it up so that when you return the first time, it will jump to the beginning of the target function that you want to execute.
call a method (in ASM) that pushes any need-to-preserve state to the stack, switches the stack pointer, pops the preserved state and then returns.

(3) (i.e. setcontext) is (the simplest implementation) just a bunch of pushes and pops. You can do some additional magic to tell e.g. gdb what's going on, which can be useful for debugging.

PleegWat

@dkf Except I expect in javascript each function's local variable is in a new heap object anyway, in order for closures to work? Though of course that could be optimised out by the compiler. I guess this may make more sense:

function unrelated(cb) {
    cb();
}

function* g1() {
    yield 1;
    unrelated(function() {
        yield 2;
        yield 3;
        yield 4;
    }
    yield 5;
}

dkf

@pleegwat said in Theoretical musings on language design:

I guess this may make more sense

Alas, you throw some non-trivial promise resolution in there and I can guarantee that your brain will hurt…

PleegWat

@dkf Because unrelated may call its callback after g1 has already returned, and the iterator is done? I see your problem. You'd have to keep the iterator active until the closure has been freed, but in a garbage-collected environment that'd be undesirable.

dkf

@pleegwat said in Theoretical musings on language design:

but in a garbage-collected environment that'd be undesirable

Depends what semantics you want, Shirley?

PleegWat

@dkf Well, it would mean that the time the generator closes depends on when the garbage collector runs, and I was under the understanding that it's undesirable for the garbage collector to directly affect the program like that?

Dreikin

@pleegwat said in Theoretical musings on language design:

@dkf Except I expect in javascript each function's local variable is in a new heap object anyway, in order for closures to work? Though of course that could be optimised out by the compiler. I guess this may make more sense:
function unrelated(cb) {
    cb();
}

function* g1() {
    yield 1;
    unrelated(function() {
        yield 2;
        yield 3;
        yield 4;
    }
    yield 5;
}

The more I try to figure out what you want, the more it makes me think it will lead to spaghetti. I'm fine leaving it at "Either I don't know what @PleegWat wants, or I do and I think it's a terrible idea", but if you don't mind, could you try making a different example? Or explaining how you think the logic should flow in the one(s) you gave?

PleegWat

@dreikin I suspect when yield was first introduced people thought it would lead to spaghetti too.

The idea would be that yields in the anonymous callback would yield a value on the iterator, just as yields in the body of g1 would. Though I have, by now, forgotten what I was getting at anyway so never mind.

dkf

@pleegwat said in Theoretical musings on language design:

The idea would be that yields in the anonymous callback would yield a value on the iterator

That can be made to work provided:

A stack (or stack-let, or whatever) is allocated on entry to g1 and used for the calls it makes, and
All yields happen when executing within that stack. Jumping elsewhere won't work (hence the problem with promises I alluded to earlier).

If the semantics of function* are that it makes a stack-let and then runs its body on that stack, and the semantics of yield are such that it pauses the current stack-let and errors if there's no current stack-let (this isn't the way that, say, Python works), then it should hold together.

PleegWat

@dkf said in Theoretical musings on language design:

@pleegwat said in Theoretical musings on language design:

The idea would be that yields in the anonymous callback would yield a value on the iterator

That can be made to work provided:

A stack (or stack-let, or whatever) is allocated on entry to g1 and used for the calls it makes, and

All yields happen when executing within that stack. Jumping elsewhere won't work (hence the problem with promises I alluded to earlier).

If the semantics of function* are that it makes a stack-let and then runs its body on that stack, and the semantics of yield are such that it pauses the current stack-let and errors if there's no current stack-let (this isn't the way that, say, Python works), then it should hold together.

On further thinking: I think 2 prevents most generators that depend on async IO? I don't know the details enough to say.

I also think this approach wouldn't work wouldn't work very well for finite generators, as it would be nigh impossible to ensure no references exist that could yield from the closure given that external references to the generator remain. So it does not trivially depend on the garbage collector.

You may need an explicit syntax to finalize the generator.

dkf

@pleegwat said in Theoretical musings on language design:

You may need an explicit syntax to finalize the generator.

Not necessarily; it'd depend on the references from outside going away. However, the whole thing gets more than a bit complicated (and I'm pretty tired this evening), to the point where I can't argue in the abstract about it any more, and would need an actual example to work through.