Deadlock Empire

Gąska

The Deadlock Empire

Slay dragons, learn concurrency! Play the cunning Scheduler, exploit flawed programs and defeat the armies of the Parallel Wizard.

Having debugged race conditions for hundreds of hours in total, it was pretty easy for me - but I imagine it might be very eye-opening to those less accustomed with the dangers of concurrency. But either way, it's rather fun. It's different from most "learn programming" games in that you don't write code - you make the code crash.

topspin

@Gąska Fun so far.
Reading the instructions takes longer than making it crash. I'll finish the rest after the upcoming meeting. (Or during )

Carnage

@Gąska There are so many dumb I broke my brain.
Though I kind of worked explicitly with fixing broken parallel code of others for a couple of years and saw some brain breakingly dumb things in live code so I guess it's all credible enough.

pie_flavor

@Gąska That was fun.

cvi

Condition variables are, unfortunately, still a rather difficult topic. We won't even try to get you a confusing story here, they're just hard. Try. If you fail, skip.

*snerk*

Parody

A nice little diversion. Screwed up the final level a bunch of times, but otherwise it went pretty smoothly.

kazitor

@Gąska said in Deadlock Empire:

Slay dragons, learn concurrency! Play the cunning Scheduler, exploit flawed programs and defeat the armies of the Parallel Wizard.

Oh, looks fun!

PleegWat

@cvi said in Deadlock Empire:

Condition variables are, unfortunately, still a rather difficult topic. We won't even try to get you a confusing story here, they're just hard. Try. If you fail, skip.

*snerk*

Conditions aren't difficult if you use the right patterns. Always wait on conditions in a loop:

Lock();
while( flag == false )
{
    Wait()
}
/* Code to be run when flag is true */
Unlock();

Lock();
flag = true;
Signal();
Unlock();

Gąska

~~@PleegWat If I'm not mistaken, Signal() could be moved outside the lock, right?~~ Nevermind, see below.

PleegWat

@Gąska It looks like it. But certainly the access to flag must be locked, since the other thread is relying on the value of the flag not changing in the code segment. I'd play safe and signal before releasing the lock.

cvi

@PleegWat said in Deadlock Empire:

Always wait on conditions in a loop:

... assuming you care about spurrious wakeups (and your threading API has them)?

@Gąska said in Deadlock Empire:

If I'm not mistaken, Signal() could be moved outside the lock, right?

In the APIs that I know, yes (but double-check the relevant documentation). In those APIs, having the signal inside of a lock is a bit of a pessimization, since the wait will try to continue, but immediately be stopped by having to wait to re-aquire the lock.

PleegWat

@cvi said in Deadlock Empire:

... assuming you care about spurrious wakeups (and your threading API has them)?

I'm used to pthreads (on linux) which can indeed wake up too many threads. It may also not wake up enough threads, if the imagined receiver is not yet waiting.

@cvi said in Deadlock Empire:

In the APIs that I know, yes (but double-check the relevant documentation). In those APIs, having the signal inside of a lock is a bit of a pessimization, since the wait will try to continue, but immediately be stopped by having to wait to re-aquire the lock.

Now I remember. The signal() must be locked, to ensure the imagined receiver thread is waiting. Otherwise, the receiving thread could be between the test and the wait when the signal arrives, and the signal is lost.

I am not familiar enough with the internals (IE futexes, and how linux pthreads uses them) to say whether the pessimization is there in that case.

Your imagined usecase may be more suitable for semaphores.

cvi

@PleegWat said in Deadlock Empire:

I'm used to pthreads (on linux) which can indeed wake up too many threads. It may also not wake up enough threads, if the imagined receiver is not yet waiting.

You might not care about waking up too often. For example, being woken up and then discovering that there are zero pending events is OK in some usecases. (There's essentially an outer loop and the flag is implicit.)

But I'm mainly railing against the "do it this way, always, no reasons given".

Now I remember. The signal() must be locked, to ensure the imagined receiver thread is waiting. Otherwise, the receiving thread could be between the test and the wait when the signal arrives, and the signal is lost.

That requires additional coordination beforehand, though. In your example, Thread 2 could get through the whole Lock(); flag = true; Signal(); Unlock(); sequence before the first thread ever gets to its first Lock().

Edit2: ... of course, then the Wait() won't happen, thanks to the flag. . I'm going to go to bed now, clearly it's getting too late for me.

Edit:

Your imagined usecase may be more suitable for semaphores.

Ain't got no fancy semaphores in the C++ standard library. We make do with mutexes and condition variables.

Gąska

@cvi said in Deadlock Empire:

@PleegWat said in Deadlock Empire:

Always wait on conditions in a loop:

... assuming you care about spurrious wakeups (and your threading API has them)?

Not just spurious wakeup, but also when you skip the wait when the flag is already true at the start (which is something you most likely want to do if you don't know when the condition variable will be signaled again when the flag is already raised). Without the loop, the following scenario is possible:

Flag is false.
Thread 1 acquires lock.
Thread 1 checks flag.
Thread 1 waits on CV and releases lock.
Thread 2 acquires lock.
Thread 2 raises the flag.
Thread 3 wants to acquire the lock, waits on mutex.
Thread 2 signals CV.
Thread 2 releases the lock.
Thread 3 acquires the lock.
Thread 3 executes the critical section.
Thread 3 lowers the flag.
Thread 3 releases the lock.
Thread 1 acquires the lock.
Thread 1 executes the critical section when the flag is already lowered. Boom.

PleegWat

@cvi said in Deadlock Empire:

@PleegWat said in Deadlock Empire:

I'm used to pthreads (on linux) which can indeed wake up too many threads. It may also not wake up enough threads, if the imagined receiver is not yet waiting.

You might not care about waking up too often. For example, being woken up and then discovering that there are zero pending events is OK in some usecases. (There's essentially an outer loop and the flag is implicit.)

But I'm mainly railing against the "do it this way, always, no reasons given".

In this case (in places where my code uses the pattern) there is no flag. The condition means "queue is not empty", and the lock protects the queue.

Now I remember. The signal() must be locked, to ensure the imagined receiver thread is waiting. Otherwise, the receiving thread could be between the test and the wait when the signal arrives, and the signal is lost.

That requires additional coordination beforehand, though. In your example, Thread 2 could get through the whole Lock(); flag = true; Signal(); Unlock(); sequence before the first thread ever gets to its first Lock().

That's why flag is checked before waiting - if flag was true when thread 1 gets to its locked section, then thread 1 must not wait at all (because flag is already true, and the condition may not be flagged at all until after flag has been set false again elsewhere).

In a queue case, this means the consumer should not wait for the condition if the queue is not empty, because the producer may not signal the queue being non-empty if it wasn't empty on insert.

Edit:

Your imagined usecase may be more suitable for semaphores.

Ain't got no fancy semaphores in the C++ standard library. We make do with mutexes and condition variables.

I don't think pthreads has semaphores, but posix does, and if I remember correctly there is a linux-specific semaphore mechanism as well.

cvi

@PleegWat said in Deadlock Empire:

In this case (in places where my code uses the pattern) there is no flag. The condition means "queue is not empty", and the lock protects the queue.

OK - we're on the same page then.

That's why flag is checked before waiting - if flag was true when thread 1 gets to its locked section, then thread 1 must not wait at all (because flag is already true, and the condition may not be flagged at all until after flag has been set false again elsewhere).

Realized it shortly after posting. Well, at least the only consequence of making those mistakes is feeling slightly stupid for posting them publicly, and not hours of debugging threaded code. A fair trade, I think.

dkf

@cvi said in Deadlock Empire:

In the APIs that I know, yes (but double-check the relevant documentation).

That's actually unsafe if the condition isn't volatile. You don't just need access to the shared state to be inside the locked region, you also need a memory barrier in there so that the shared state isn't cached. The unlock acts as a memory barrier.

cvi

@dkf said in Deadlock Empire:

That's actually unsafe if the condition isn't volatile.

Hmm? You sure? Rereading the C++ memory model specification is a bit out-of-scope for now, but none of the examples on e.g. cppreference make the shared variable volatile. I'm also not sure if that's needed, since (as you say) Unlock()/Lock() act as a memory barrier. In either version (Signal inside locked region or outside), there's no way of accessing the flag without passing the memory barriers.

dkf

@cvi said in Deadlock Empire:

Hmm? You sure?

If it's volatile then the writes have to happen at that point anyway. Touching a volatile basically ends up with what might as well be a memory barrier, since the memory must be read or written as directed (this is vital for memory-mapped hardware FWIW, where reads and writes can have all sorts of special meanings; caching semantics of any form are wrong in those cases). The flip side of this is that volatile has a lot more overhead than is immediately apparent.

The whole point of this is that it is the real shared state that must be read from and written to inside the locked region, and not just some local cached version of that state.

topspin

@dkf AFAICT, you shouldn't be using volatile for things that aren't actually volatile, i.e. memory-mapped stuff. For synchronization there's atomic and memory_order.

dkf

@topspin That's because they impose less stringent conditions on the compiler than volatile does. The latter basically totally prohibits all trickery, and trickery is how compilers get speed; without it, you'd be almost so slow you'd think you were writing Python.

Well, kidding a bit about that. Python's even slower than that.

Gąska

@dkf technically, yes, volatility is important and should always be kept in mind. But practically, in case of multithreading, all that is taken care of by synchronization primitives, so the programmer doesn't even have to think about volatility. See C++17 spec, section 4.7.1 (similar language appears in earlier revisions too).

topspin

@dkf said in Deadlock Empire:

@topspin That's because they impose less stringent conditions on the compiler than volatile does.

They're also correctly expressing intent (right tool for the job etc.) instead of just sprinkling volatile everywhere.

cvi

@dkf said in Deadlock Empire:

That's because they impose less stringent conditions on the compiler than volatile does. The latter basically totally prohibits all trickery, and trickery is how compilers get speed; without it, you'd be almost so slow you'd think you were writing Python.

I don't think volatile is sufficient, though, in general. Volatile guarantees that the write is emitted by the compiler, but it doesn't guarantee that the write will be globally visible (i.e., doesn't just hang around in a cache, if the memory is cached). Atomics will additionally ensure the latter and that writes to them aren't reordered by the CPU.

For example, on x86, writing to a volatile variable is just a mov. Writing to an atomic is a mov + mfence. (It's probably also less of an issue with x86, which has a ton of cache coherency stuff in hardware.)

Applied Mediocrity

BaaderMeinhofException

Just last night our main service crashed hard with SynchronizationLockException because something in my threading code is obviously wrong. I haven't quite figured out what exactly, but other than that it's obvious.

Gąska

@Applied-Mediocrity

The exception that is thrown when a method requires the caller to own the lock on a given Monitor, and the method is invoked by a caller that does not own that lock.

SynchronizationLockException is thrown by calling the Exit, Pulse, PulseAll, and Wait methods of the Monitor class from an unsynchronized block of code.

Should be fairly easy to debug, if you have call stack.

...You do have call stack, don't you?

PleegWat

@cvi said in Deadlock Empire:

But I'm mainly railing against the "do it this way, always, no reasons given".

As this discussion proves, being certain a specific construct works is quite complicated. Which is why I do tend to point initially to known patterns which are safe.

Another thing like that is locking order. You can reason about it all you want, but in the end it is the easiest and least error prone to just ensure every single pair of locks A and B in your application is in one of the following three mutually exclusive categories:

A and B are never taken at the same time.
There is code which holds both A and B, but A is never taken when B is already held.
There is code which holds both A and B, but B is never taken when A is already held.

PleegWat

@cvi said in Deadlock Empire:

@dkf said in Deadlock Empire:

That's because they impose less stringent conditions on the compiler than volatile does. The latter basically totally prohibits all trickery, and trickery is how compilers get speed; without it, you'd be almost so slow you'd think you were writing Python.

I don't think volatile is sufficient, though, in general. Volatile guarantees that the write is emitted by the compiler, but it doesn't guarantee that the write will be globally visible (i.e., doesn't just hang around in a cache, if the memory is cached). Atomics will additionally ensure the latter and that writes to them aren't reordered by the CPU.

For example, on x86, writing to a volatile variable is just a mov. Writing to an atomic is a mov + mfence. (It's probably also less of an issue with x86, which has a ton of cache coherency stuff in hardware.)

I really should look into that stuff more. I've got a ringbuffer implementation in my code, single producer single consumer, which does not use locking and does not have a single volatile or similar declaration. It's never caused problems that I know of. Yet I cannot get rid of the nagging feeling that it cannot possibly be safe.

Cursorkeys

@Gąska said in Deadlock Empire:

so the programmer doesn't even have to think about volatility. See C++17 spec, section 4.7.1 (similar language appears in earlier revisions too

You still do have to think about it for things like interrupts, even with that. Like, say, loops that touch a variable that is changed by an interrupt.

A good example of that was my coworker who wrote a loop that counted changes of a (non-volatile) flag ^='d by an interrupt. It always came back as the initialised value of the variable because the optimisation had just removed the check from the loop, as the flag 'couldn't' change there.

Stupid thing's forgotten how to count!

DoctorJones

Awesome find! That'll be really handy for going over with the junior devs.

It's a good illustration of the hazards of multi threading, and using low level tooling such as monitor.enter, etc.

That's why I prefer to see people using locks, unless we've got some really specialised high performance scenario. They're simpler to understand and maintain. If you add in some best practices, like using a dedicated lock for each job, you can minimise problems.

I really like the double checked lock pattern, because it eliminates the need for even acquiring the lock if your condition is not met.

// don't even bother acquiring the lock if it's already been loaded
if (!isLoaded)
{
   // remember that, when isLoaded is false, multiple threads can wait for this lock, so it's
   // important to check isLoaded again, to ensure we don't end up with multiple calls to Load
   lock (loadLock)
   {
       // we need to check if we were the first one to acquire the lock, or if we were just queuing 
       // while another thread was doing the load
       if (!isLoaded)
       {
           Load();
           isLoaded = true;
       }
   }
}

It's a slight shame that there's no examples of lock abuse in the challenges.

Fun fact: in Java, double checked locking didn't used to work, because the compiler saw both if statements, and optimised one of them away! You had to work around it by farming your if logic out to another method, so they looked like two different checks. I don't believe this is still a problem in Java, but I've not worked with Java in over 10 years.

dkf

@cvi said in Deadlock Empire:

Volatile guarantees that the write is emitted by the compiler, but it doesn't guarantee that the write will be globally visible (i.e., doesn't just hang around in a cache, if the memory is cached). Atomics will additionally ensure the latter and that writes to them aren't reordered by the CPU.

This clarifies things a lot: you have obviously never actually worked with memory-mapped I/O devices.

Gąska

@dkf or maybe he did and he also worked with multithreaded code and he knows that what works for memory mapped devices doesn't work for cross-thread communication?

PleegWat

@DoctorJones said in Deadlock Empire:

I really like the double checked lock pattern, because it eliminates the need for even acquiring the lock if your condition is not met.

That looks like a rather specific case for library initialisation. I'd recommend pthread_once in linux C code.

dkf

@Gąska Not really; memory mapped devices have to be both read from and written exactly as described in code (or weird stuff happens, such as your network adaptor hanging or your message not being sent), and must be truly visible in all threads that access them because they are the implementation of that memory location.

DoctorJones

@PleegWat said in Deadlock Empire:

@DoctorJones said in Deadlock Empire:

I really like the double checked lock pattern, because it eliminates the need for even acquiring the lock if your condition is not met.

That looks like a rather specific case for library initialisation. I'd recommend pthread_once in linux C code.

You'd be surprised how often the need for it crops up, especially in web development. Unfortunately, I've not encountered many devs who are aware of the pattern. I've also been dismayed by the number of web developers, who fail to realise that their backend code is a highly multithreaded application, with each request being another thread. They usually have the attitude of "this is web stuff, we don't need to do any multithreading here, so we don't need to bother about thread safety".

Applied Mediocrity

@Gąska said in Deadlock Empire:

...You do have call stack, don't you?

I do. I have no idea, however, how did it even get there. It's pretty much the recommended lock with timeout pattern, is it not?

bool StatusLockTaken = false;
try
{
    Monitor.TryEnter(StatusLock, (int)TR_LOCK_TIMEOUT, ref StatusLockTaken);
    if (StatusLockTaken)
    {
        // Read status.
    }
}
finally
{
    if (StatusLockTaken)
        Monitor.Free(StatusLock); // this caused the WHARRGARBL.
    else
        ; // Log that the lock could not acquired and carry on. Almost never happens.
}

And I just thought it's sort of funny. We're... well, you're having this discussion here about concurrency and -bam!- my code acts up with a very related error out of a sudden. Are aliens real?

Gąska

@dkf said in Deadlock Empire:

@Gąska Not really; memory mapped devices have to be both read from and written exactly as described in code (or weird stuff happens, such as your network adaptor hanging or your message not being sent), and must be truly visible in all threads that access them because they are the implementation of that memory location.

But doesn't that rely on some extra OS machinery that detects that memory write is to a memory-mapped device and triggers extra synchronization/publication work to make the value visible to the device, that doesn't get triggered on volatile writes to regular memory? I don't really know, but I always imagined that actually being mapped to a device is a critical part of the process.

PleegWat

@DoctorJones said in Deadlock Empire:

@PleegWat said in Deadlock Empire:

@DoctorJones said in Deadlock Empire:

I really like the double checked lock pattern, because it eliminates the need for even acquiring the lock if your condition is not met.

That looks like a rather specific case for library initialisation. I'd recommend pthread_once in linux C code.

You'd be surprised how often the need for it crops up, especially in web development. Unfortunately, I've not encountered many devs who are aware of the pattern. I've also been dismayed by the number of web developers, who fail to realise that their backend code is a highly multithreaded application, with each request being another thread. They usually have the attitude of "this is web stuff, we don't need to do any multithreading here, so we don't need to bother about thread safety".

Yup, in web anything concerning cross-request state is going to be a multithreading situation.

I'm calling it an initialization pattern because (at first glance) it is only safe if isLoaded never becomes false again once it has been true.

Gąska

@PleegWat well, in this specific case, it is initialization code (because of the names). But the pattern is very general, and can be applied in many different situations, including when the condition goes true and false repeatedly.

DoctorJones

@PleegWat said in Deadlock Empire:

Yup, in web anything concerning cross-request state is going to be a multithreading situation.

I'm calling it an initialization pattern because (at first glance) it is only safe if isLoaded never becomes false again once it has been true.

My example was initialisation, but it's good for protecting any synchronised if statement. It's safe, because that's the whole point of the double check, i.e. if the value of the if statement changes, we check it again.

if (theThingNeedsDoing)
{
    lock (theLockForTheThing)
    {
        if (theThingNeedsDoing)
        {
            DoTheThing();
        }
    }
}

The outer if statement is merely an optimisation of the following:

lock (theLockForTheThing)
{
    if (theThingNeedsDoing)
    {
        DoTheThing();
    }
}

That's totally thread safe*, but inefficient. Hence the optimisation.

*assuming the "theLockForTheThing" object is not being abused.

Gąska

@DoctorJones also, assuming theThingNeedsDoing is itself synchronized/atomic.

cvi

@dkf said in Deadlock Empire:

This clarifies things a lot: you have obviously never actually worked with memory-mapped I/O devices.

Only on ones that do not do out-of-order execution nor have caches, so that problem disappears there. I'm also betting that on something like x86, you would have to be rather careful about setting up the memory region for the IO with the right memory type settings (via mtrr) to avoid these kinds of problems.

When uploading data to a GPU you run into the same issues. If you write to host-memory, you need to either ensure that caches get flushed so that the writes become visible in main memory (which the GPU can see), or you need to make sure you use the right type of memory (typically something that isn't cached). Same mechanism on x86 as far as I know. Volatile doesn't help with any of those, other than making sure that the instructions for the writes are emitted by the compiler.

I'm holding fast to my claims, though. volatile only ensures that the compiler doesn't optimize away reads/writes (nor reorders them?), but doesn't emit additional instructions to ensure that the hardware doesn't do stuff like that.

This is especially visible on ARM64. A write to a volatile variable produces a str. A write to an atomic produces stlr (store-release). The latter ensures that all writes from before the stlr are visible (i.e., hardware may not reorder writes so that ones from before the stlr occur after it).

If you're only working with very simple chips (no out-of-order execution, for example), you may never run into these issues, though.

dkf

@Gąska It also depends on whether the theThingNeedsDoing flag is assigned only once or not. If it is (because we're dealing with the “expensive initialization” scenario) then the nasty failure mode of the double-checked locking pattern doesn't apply. That failure mode is where the thing enters the state where the task needs doing between the check and the point where the end of the block is reached; in the initialization scenario, that doesn't happen because an incorrect read just wastes a bit of time (i.e., once a thread has seen that there is no work to do, that'll remain true for the rest of the lifetime of the thread, and in the converse case a correctly-locked read is the next step).

dkf

@cvi said in Deadlock Empire:

(nor reorders them?)

Correct. Volatile reads and writes can't be reordered past each other at all and much hardware depends on exactly that.

topspin

@DoctorJones said in Deadlock Empire:

Fun fact: in Java, double checked locking didn't used to work

True for both old Java as well as C++ pre-11. With C++11 you can just use static initializers and the compiler will do the right thing.

anonymous234

You don't need this stuff! Just use a SynchronizedQueue or something similar and let the standard library worry about implementing it! </blakeyrant>

boomzilla

@anonymous234 said in Deadlock Empire:

~~You don't need this stuff! Just use a SynchronizedQueue or something similar and let the standard library worry about implementing it~~NONE OF THIS IS FUNNY STUFF! </blakeyrant>

dkf

@boomzilla See definition #2 at:

Oxford Languages | The Home of Language Data

Difficult to explain or understand; strange or odd.

‘I had a funny feeling you'd be around’
‘it's a funny old world’
‘I do get some funny looks’
‘the funny thing is I can't remember much about it’
‘that's funny!—that vase of flowers has been moved’

boomzilla

@dkf I hear you, but that wouldn't change blakey's mind.

Gąska

@boomzilla at least the back-and-forth was between 3-5 people now, so he wouldn't even have an excuse to use that gif of his.