Am I the dumb? Why are we rolling our own Destroy logic?
-
So, small backstory. This is a UE4 module wrote in-house to handle player chatter in the game (because, apparently, the built-in version sucked somehow).
We're currently experiencing an issue where the game servers occasionally crash on exit because the VoIP client on the server is still processing data from the chat server as it's being destroyed. It's not too big a deal, since we're obviously intending the server to go down, but annoying since crashing could disrupt other shutdown sequences (like database calls and the like).Apparently we're manhandling the destruction process, even so far as to call the destructors directly:
Am I wrong in wearing this face: ?
-
@tsaukpaetra And i fully expect this to keep crashing :)
What could help instead is to wait in the main (Tick etc) thread for the client thread to finish shutting down.In my project there is a thread that receives and processes video from a camera, sometimes it crashes at exit for that reason.
Is that what
EnsureCompletion
is supposed to do? But i would not be sure that the thread is actually destroyed.Manually calling the destructor seems a great way to double-delete stuff.
-
@tsaukpaetra "We tried using built in language features, and it crashed all the time." I wonder if anyone actually got around to, y'know, debugging those crashes? Because that's what really stands out about this whole exchange.
-
@adynathos said in Am I the dumb? Why are we rolling our own Destroy logic?:
Is that what
EnsureCompletion
is supposed to do? But i would not be sure that the thread is actually destroyed.Yes, the idea is that you signal in some way that the thread should die and wait for it to do so. Ours looks like this:
And inside the thread's running function:
The WaitForCompletion is Engine code that (on Windows) does this:
@pie_flavor said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra "We tried using built in language features, and it crashed all the time." I wonder if anyone actually got around to, y'know, debugging those crashes? Because that's what really stands out about this whole exchange.
What if I told you, the normal go-to solution for fixing these kinds of things was to make everything
static
?
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
If the ugly hack worked and is not causing additional trouble, trying to fix the program to destroy the right way would be a waste of time.
-
@sockpuppet7 said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
If the ugly hack worked and is not causing additional trouble, trying to fix the program to destroy the right way would be a waste of time.
Yes, the current path of least resistance is to fix one of the loops in the thread so it would exit properly (when asked), and move the destructor call up two lines so it's always called regardless.
-
@tsaukpaetra How do you ensure that the thread picks up the change to the
ThreadShouldRun
variable promptly? Without a synchronisation barrier of some kind, there's no guarantee at all that you'll get that variable change visible in the other thread promptly; it'll depend on when the CPU decides to actually flush that write through from cache to real memory.You might find that it is enough to declare the variable to be
volatile
, or you might need to use some sort of signalling condition variable. (There's a whole bunch of ways to implement this sort of thing, depending on language and platform.)
-
@dkf said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra How do you ensure that the thread picks up the change to the
ThreadShouldRun
variable promptly? Without a synchronisation barrier of some kind, there's no guarantee at all that you'll get that variable change visible in the other thread promptly; it'll depend on when the CPU decides to actually flush that write through from cache to real memory.You might find that it is enough to declare the variable to be
volatile
, or you might need to use some sort of signalling condition variable. (There's a whole bunch of ways to implement this sort of thing, depending on language and platform.)We don't. To my knowledge, nobody that came before me (or after) knows about locks, and this code was written before me.
-
@dkf said in Am I the dumb? Why are we rolling our own Destroy logic?:
You might find that it is enough to declare the variable to be volatile
That's never a good idea in C++. It may be true if you compile with MSVC with the correct flags to enable this MSVC-specific behavior, but I'd say that's a very bad idea. Even for a flag that's written by exactly one thread and read by exactly one other,
volatile
isn't guaranteed to work in C++.
-
@dfdub Oh yes, that's Java where that works. C++ uses the C model where volatile means ādon't optimise these writes out or reorder themā (which is totally utterly vital for dealing with memory-mapped hardware).
@Tsaukpaetra In any case, the key thing is that some memory barriers are needed; without them, you've got a race between threads. The memory barriers will slow the code down. Another option might be to use CPU pinning so that the various threads are guaranteed to run on the same CPU as the thread that asks for them to shut down (so stopping there from being cache consistency problems, which might be the cause of the overruns) but that's a pretty horrible way to crack this nut and an indication that you're doing something vastly wrong.
Bugs that bust assumptions about the system memory model are always fun to hunt. You always end up at some point thinking you're going completely crazy until you remember that the computer is often telling you little lies about when it actually writes to real memory and when it notices that some other thread has written, and that these are (usually) for your benefitā¦
-
@tsaukpaetra What happens if you call Dequeue() on an empty queue? Who frees the memory after the dtor has been called? It might be a better idea to use InterlockedExchange or an event or a critical section or something similar to synchronize access to that flag.
-
@sockpuppet7 said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
If the ugly hack worked and is not causing additional trouble, trying to fix the program to destroy the right way would be a waste of time.
But do you really know it works right? Or did your "fix" just make the crash less likely to happen? Sometimes going that extra step pays off.
-
@tsaukpaetra calling a destructor and then setting the pointer to null?!
It's late and I'm tired, but this reeks of resource management done wrong.
-
@fatbull said in Am I the dumb? Why are we rolling our own Destroy logic?:
What happens if you call Dequeue() on an empty queue?
Presumably nothing, it will return false and the reference will remain whatever it was before the call.
@fatbull said in Am I the dumb? Why are we rolling our own Destroy logic?:
Who frees the memory after the dtor has been called?
UE4 magic. So long as it's not leaking handles (which, as far as I know, that particular issue is now fixed), I don't care who frees the memory.
@fatbull said in Am I the dumb? Why are we rolling our own Destroy logic?:
It might be a better idea to use InterlockedExchange or an event or a critical section or something similar to synchronize access to that flag.
I really don't care if the flag gets accessed properly the moment it gets updated, just within a minute or so. As argued back in other (forum) threads, so long as the new value is reflected sometime soon-ish (in human timescale, not CPU), I really don't care that instruction-by-instruction it sees it on the very next cycle.
@topspin said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra calling a destructor and then setting the pointer to null?!
It's late and I'm tired, but this reeks of resource management done wrong.Not my code. It's also a static variable, so that makes things more fun.
-
@sockpuppet7 said in Am I the dumb? Why are we rolling our own Destroy logic?:
If the ugly hack worked and is not causing additional trouble,
trying to fix the program to destroy the right way would be a waste of time@Tsaukpaetra wouldn't be posting about it here.FTFT, I think. At least I read the OP as saying that it's crashing with the hack.
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
No, it's completely appropriate. If I were to guess, the "problem" is that calling
delete
results in the VC++ library doing error checking, at least in debug. Manually calling the destructor bypasses that. Incidentally, this will also leak the memory because nothing ever callsfree()
.EDIT: I see you say that the latter is not a problem due to UE4 magic but I would not be so sure. Unless
FVoidClientTCPThread
has counted references or something.
-
@hardwaregeek said in Am I the dumb? Why are we rolling our own Destroy logic?:
@sockpuppet7 said in Am I the dumb? Why are we rolling our own Destroy logic?:
If the ugly hack worked and is not causing additional trouble,
trying to fix the program to destroy the right way would be a waste of time@Tsaukpaetra wouldn't be posting about it here.FTFT, I think. At least I read the OP as saying that it's crashing with the hack.
Something like that.
It's crashing because the hack wasn't used for all cases, and the default behavior causes a crash because natch, the thing wasn't set up properly because the built-in destructor doesn't know about the sub-object it should destroy when itself gets destroyed.
@Deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
No, it's completely appropriate. If I were to guess, the "problem" is that calling
delete
results in the VC++ library doing error checking, at least in debug. Manually calling the destructor bypasses that. Incidentally, this will also leak the memory because nothing ever callsfree()
.EDIT: I see you say that the latter is not a problem due to UE4 magic but I would not be so sure. Unless
FVoidClientTCPThread
has counted references or something.I doubt it's counting references itself, but it is static, which predecessor used as a crutch to avoid doing such things. It's impressive stuff works at all in some areas...
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
I doubt it's counting references itself, but it is static, which predecessor used as a crutch to avoid doing such things. It's impressive stuff works at all in some areas...
Static as in:
static FVoidClientTCPThread thread
or
static FVoidClientTCPThread* thread
?If the former, you really should not be calling delete on that. If you were, that would explain things crashing.
If the latter, then it is still leaking, albeit only at the end of the process. That is still not a practice you should be encouraging.
-
@deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
I doubt it's counting references itself, but it is static, which predecessor used as a crutch to avoid doing such things. It's impressive stuff works at all in some areas...
Static as in:
static FVoidClientTCPThread thread
or
static FVoidClientTCPThread* thread
?If the former, you really should not be calling delete on that. If you were, that would explain things crashing.
If the latter, then it is still leaking, albeit only at the end of the process. That is still not a practice you should be encouraging.
Oh, sorry, the thread itself isn't static, the
UVoipClient
that holds onto it is.Actually, wait... rifles through code Yeah, it's the UVoipClient:
So, when UVoipClient gets destroyed using the normal method, it's kinda trying to destroy the FVoipClientTCPThread, but not well, and so... yeah.
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Am I wrong in wearing this face: ?
This:
clientTCPThread->~FVoipClientTCPThead(); clientTCPThread = NULL;
certainly looks fishy.
As others pointed out, depending on where you get
clientTCPThread
from, this either leaks the memory (dynamic memory) or runs the destructor twice (automatic/static storage elsewhere, andclientTCPThread
just being a pointer).On a first glance, I would say that this code isn't doing the right thing (but it depends on the rest of the code). Even if the rest of the code does things properly, I'd still say that this bad design. You should not have to manually call destructors in this kind of client code.
-
@cvi said in Am I the dumb? Why are we rolling our own Destroy logic?:
Even if the rest of the code does things properly, I'd still say that this bad design. You should not have to manually call destructors in this kind of client code.
This is my feeling as well, and once we go back to normal development (we're prepping for release, so nothing but major fixes, no big rewrites like this will probably need), I may make a case to go back and do it properly.
-
@tsaukpaetra Well, there is your problem. According to the comment, Dequeue is supposed to be called from the consumer thread only. If two threads call Dequeue simultaneously during cleanup, some items could be dequeued twice or one thread could try to dequeue an item from an empty queue or worse.
Next question: What happens if Enqueue and Dequeue run in parallel? How is this synchronized?
-
@fatbull said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra Well, there is your problem. According to the comment, Dequeue is supposed to be called from the consumer thread only. If two threads call Dequeue simultaneously during cleanup, some items could be dequeued twice or one thread could try to dequeue an item from an empty queue or worse.
Point. However, the actual crash is happening when attempting to enqueue on an object that no longer exists, but it's much the same problem since the thread is still running when it's not supposed to and trying to shove data on the main thread's object that's now destroyed.
@fatbull said in Am I the dumb? Why are we rolling our own Destroy logic?:
Next question: What happens if Enqueue and Dequeue run in parallel? How is this synchronized?
The Engine's Enqueue function is synced like so:
I'm assuming it's working fine?
-
You should easily be able to safely signal the thread to end with something like
std::atomic_flag
, you can usestd::memory_order_relaxed
since you don't really care about order of operations and just want the thread to get the message and exit at its next convenience.About static data, there is something called the static initialization order fiasco, and if you have several static (global) variables referring to each other, that could be your problem. You may want to move some inside functions so that the order is guaranteed to be the order of the function calls. You may also want to mark some as
thread_local
so that they aren't accidentally shared between threads.Either way, manually calling destructors when you're not implementing a library type like
std::vector
is a big code smell. My guess is that the static variable destructor runs too late, when everything else has already been destroyed, and that the code is thus trying to reference destructed stuff at that point. Perhaps you should find a better way to control the lifetime of those objects than making them static. Simply wrapping them inside astd::optional
that you can nullify when you want should be a quick fix if you need to keep them static.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
std::
Hehe, the only code that even uses std is:
And I'm 98.3 percent sure it's not used at all...
-
@tsaukpaetra If you're stuck in C++98 land without standard library support, I wish you luck, because you're gonna be reinventing a lot of wheels no matter how you fix this.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra If you're stuck in C++98 land without standard library support, I wish you luck, because you're gonna be reinventing a lot of wheels no matter how you fix this.
I'm not sure if "stuck" is the right word. More like, nobody knows there's more/better ways to do things.
I have no idea what version we're using...
-
@tsaukpaetra easy way: print out or display the value of
__cplusplus
or_MSVC_LANG
, and try also seeing what's in the gcc macros if you're using that.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
__cplusplus
_MSVC_LANG is apparently not defined.
-
@tsaukpaetra So we know your IDE doesn't support newer than C++98 (or doesn't have the support turned on) so that's not looking good. Best bet is to figure out what compiler is being used and see if you can convince it to use a newer standard, otherwise you're probably just stuck for now.
For reference, Visual Studio 2017 actually only supports C++14 and above, it no longer has options to compile as anything older than that, and current versions of GCC use C++11 by default. So whatever you're using is at least a few years out of date, in which case I'd look at old versions of boost and see what you can salvage from there or recreate with reference.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra So we know your IDE doesn't support newer than C++98 (or doesn't have the support turned on) so that's not looking good. Best bet is to figure out what compiler is being used and see if you can convince it to use a newer standard, otherwise you're probably just stuck for now.
For reference, Visual Studio 2017 actually only supports C++14 and above, it no longer has options to compile as anything older than that, and current versions of GCC use C++11 by default. So whatever you're using is at least a few years out of date, in which case I'd look at old versions of boost and see what you can salvage from there or recreate with reference.
In theory UE4 compiles on VS 2017 now, so if/when we all get upgraded to that I suppose it will come with it.
I doubt we'd be able to use boost here, that sounds... unwise.
-
@tsaukpaetra Right, using for-real boost is not very fun (and apparently very discouraged in some circles), but if you were stuck in C++98 land you could recreate some stuff using its source as a reference (better than doing it from scratch and making other people's mistakes again). Anyway, I'm sure Unreal Engine has some substitutes for the things you'll want/need from modern C++, e.g. it probably has some threading primitives and such.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra Right, using for-real boost is not very fun (and apparently very discouraged in some circles), but if you were stuck in C++98 land you could recreate some stuff using its source as a reference (better than doing it from scratch and making other people's mistakes again). Anyway, I'm sure Unreal Engine has some substitutes for the things you'll want/need from modern C++, e.g. it probably has some threading primitives and such.
Oh, yes we're using the threading things. Just... not well, apparently.
-
@tsaukpaetra I would definitely recommend looking into the various static/global variables and seeing whether any are referencing other static/global data and if any seem to be getting used from multiple threads. That's the most likely culprit of 'we were letting the destructors run normally before and getting crashes', especially if a thread is still running when global variables it is using get destructed on the main thread.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra I would definitely recommend looking into the various static/global variables and seeing whether any are referencing other static/global data and if any seem to be getting used from multiple threads. That's the most likely culprit of 'we were letting the destructors run normally before and getting crashes', especially if a thread is still running when global variables it is using get destructed on the main thread.
That's what's being done. The parent object supposedly charged with keeping the thread is getting destroyed "normally" but because it was poorly designed it doesn't tell the thread that, and when the thread happily chirps "Hey I have data MainThread guy!" it breaks because it's already gone.
-
@lb_ said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra If you're stuck in C++98 land without standard library support, I wish you luck, because you're gonna be reinventing a lot of wheels no matter how you fix this.
This is more so a problem of Unreal having its own version of std for historical reasons.
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
I have no idea what version we're using...
I presume you use Visual Studio? What version?
-
@deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
I presume you use Visual Studio? What version?
2013.3
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
@deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
I presume you use Visual Studio? What version?
2013.3
OK, that's not too bad.
- IIRC
std::atomic
is present. std::optional
doesn't exist until VS2017 and even then you have to use the/std:c++17
switch. You can substitude with astd::unique_ptr
.- I don't think
thread_local
is available but it can be substituted with__declspec(thread)
.
- IIRC
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
Apparently we're manhandling the destruction process, even so far as to call the destructors directly:
Eww. So you have a pointer (clientTCPThread), call the destructor of the pointed thread manually, then null the pointer. So you either have two owners (one that manages the memory the thread lives in, and another that calls the destructor and then forgets about the memory), which means you may be double deleting, or you have a leak there. Splitting the availability of the memory from the lifetime of the object that lives there is ugly and error prone.
@dkf said in Am I the dumb? Why are we rolling our own Destroy logic?:
You might find that it is enough to declare the variable to be volatile, or you might need to use some sort of signalling condition variable.
Volatile is not a synchronization mechanism. Need to signal some other way, make it atomic or similar.
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
We don't. To my knowledge, nobody that came before me (or after) knows about locks, and this code was written before me.
People that don't know about locks wrote threaded code in C++?
-
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
People that don't know about locks wrote threaded code in C++?
And C# on the other side!
-
@deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
substitude
That's one hell of a substitude you got there, buddy. Maybe you need to rein it in a little.
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
People that don't know about locks wrote threaded code in C++?
And C# on the other side!
That explains a lot. This does look like "I wish I was writing C#" kind of C++ code :D
-
@deadfast said in Am I the dumb? Why are we rolling our own Destroy logic?:
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
People that don't know about locks wrote threaded code in C++?
And C# on the other side!
That explains a lot. This does look like "I wish I was writing C#" kind of C++ code :D
It's just as bad on the other side....
-
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
Volatile is not a synchronization mechanism.
It'd guarantee that the reads and writes actually happen as written. With only one thread writing and just once, it'd give a fairly cheap way for the reader thread to see when the write had occurred. Eventually. There's not a notion of that being timely. (Calling a suitable synchronisation function is better.)
Of course, the right way to fix this is to message the other thread to stop (by whatever primitive you want) and then to join that thread, blocking until things are torn down.
-
@dkf said in Am I the dumb? Why are we rolling our own Destroy logic?:
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
Volatile is not a synchronization mechanism.
It'd guarantee that the reads and writes actually happen as written. With only one thread writing and just once, it'd give a fairly cheap way for the reader thread to see when the write had occurred. Eventually. There's not a notion of that being timely. (Calling a suitable synchronisation function is better.)
As far as I know, all that
volatile
guarantees is that the compiler will not optimise away loads and stores that it thinks are redundant.while ( flagvar ) { do_not = call_any + functions; }
If
flagvar
is not declaredvolatile
, the compiler won't necessarily generate code to reload the value.If it is declared
volatile
but with no memory barriers or similar, you'll end up with strange behaviour depending on ... stuff.
-
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
People that don't know about locks wrote threaded code in C++?
It doesn't matter which language they're using. And ultimately, any in the code is just a consequence of TR, which is a bunch of people who don't know about locking and other synchronisation funtimes, but are writing threaded code anyway.
-
@dkf said in Am I the dumb? Why are we rolling our own Destroy logic?:
@kian said in Am I the dumb? Why are we rolling our own Destroy logic?:
Volatile is not a synchronization mechanism.
It'd guarantee that the reads and writes actually happen as written. With only one thread writing and just once, it'd give a fairly cheap way for the reader thread to see when the write had occurred. Eventually. There's not a notion of that being timely. (Calling a suitable synchronisation function is better.)
There's no guarantee that the reader thread will see it. If, by mischance(1), the scheduler leaves it on the same vCPU (crudely speaking, the same hyperthread of the same core of the same CPU chip, although I think that "same core of same chip" suffices), and that vCPU doesn't do enough stuff to invalidate its own local L1 cache, it won't ever see the write.
(1) Maybe the reader pinned itself there, who knows.
-
@tsaukpaetra said in Am I the dumb? Why are we rolling our own Destroy logic?:
I'm assuming it's working fine?
Surprisingly, this actually seems OK ... on x86 and machines with similar memory coherency guarantees (wouldn't be convinced on e.g. ARM, but I don't know it well enough by heart to tell either way.), and assuming that the TNode constructor does the necessary things (set NextNode to nullptr) . :-)
The MoveTemp() is a bit fishy (is it a workaround for pre-C++11?), as is initializing Tail->Item with a fresh ItemType. But again, hard to tell without seeing what MoveTemp() does.
For post-C++11 (if you ever get to go there), you could just use
std::atomic<>
in all the places. IIRC loading/storing astd::atomic
with e.g.std::memory_order_relaxed
will compile to a simplemov
on x86, much like the current code, but still do the right thing on platforms that need more explicit synchronization (but that doesn't seem like a high priority ;-) ).
-
@steve_the_cynic said in Am I the dumb? Why are we rolling our own Destroy logic?:
There's no guarantee that the reader thread will see it. If, by mischance(1), the scheduler leaves it on the same vCPU (crudely speaking, the same hyperthread of the same core of the same CPU chip, although I think that "same core of same chip" suffices), and that vCPU doesn't do enough stuff to invalidate its own local L1 cache, it won't ever see the write.
Is that true on x86 as well, though? From what I read, the x86 has somewhat over-the-top cache-coherency protocols that would make sure that this doesn't occur. Which is why one (IIRC) can implement atomics with just
mov
on x86 (orxchg
+mov
if you want sequental consistency), but most other platforms require additional/special instructions (fences or special loads/stores).Edit: I guess one could end up in a situation where the L1$ line with the data is "never" flushed, thus preventing the data from becoming visible elsewhere in a reasonable time (but what if the reader is trying to access the same L1$ line? That should kick in cache coherency on the x86 again and have the cache line flushed.). Also this doesn't quite seem to match up with the vCPU/HT/same chip stuff that STC mentions above, so something else would have to be going on?
-
@cvi said in Am I the dumb? Why are we rolling our own Destroy logic?:
@steve_the_cynic said in Am I the dumb? Why are we rolling our own Destroy logic?:
There's no guarantee that the reader thread will see it. If, by mischance(1), the scheduler leaves it on the same vCPU (crudely speaking, the same hyperthread of the same core of the same CPU chip, although I think that "same core of same chip" suffices), and that vCPU doesn't do enough stuff to invalidate its own local L1 cache, it won't ever see the write.
Is that true on x86 as well, though? From what I read, the x86 has somewhat over-the-top cache-coherency protocols that would make sure that this doesn't occur. Which is why one (IIRC) can implement atomics with just
mov
on x86 (orxchg
+mov
if you want sequental consistency), but most other platforms require additional/special instructions (fences or special loads/stores).Edit: I guess one could end up in a situation where the L1$ line with the data is "never" flushed, thus preventing the data from becoming visible elsewhere in a reasonable time (but what if the reader is trying to access the same L1$ line? That should kick in cache coherency on the x86 again and have the cache line flushed.). Also this doesn't quite seem to match up with the vCPU/HT/same chip stuff that STC mentions above, so something else would have to be going on?
Terminology (sorry, perhaps I'm not totally clear):
- vCPU = single "virtual" CPU - that is, a single hyperthreaded instance of a single core in a single socket. A core will have 1, 2, or in some of the top-end i9s, 4 vCPUs. Noteworthy: if you break down Task Manager's CPU utilisation graphs, it shows vCPUs, not cores.
- Core = group of vCPUs that share L1 and L2 cache.
- Chip / socket = a group of cores all located on a single die, and the socket in which that die's package is placed.
Coherency between cores/vCPUs in different sockets is significantly harder than between different cores of the same socket, and there are substantial performance hits when locks are contested between threads on different sockets.
And it's entirely possible that I'm talking 90% nonsense.