My web app died from performance bankruptcy

Jaloopa

@cartman82 said in My web app died from performance bankruptcy:

they don't intend to fix it on their end.

0_1510218908971_60990311-dd9c-4b4f-81db-981190440b8f-image.png

dkf

@el_heffe said in My web app died from performance bankruptcy:

What good is performance if the features are broken and don't work properly?

With performance, the failures can be described via interpretative dance.

RobFreundlich

@julianlam said in My web app died from performance bankruptcy:

As a user, I certainly do not care about “being part of moving the web forward aggressively”. Why should I? I like my stuff working, not broken. Nobody ever wants it the other way around.

This is exactly how I feel as a user and a developer. Nicely said!

Bulb

@topspin said in My web app died from performance bankruptcy:

Wow, I thought only Google corporate were assholes, but their engineers are really smart.

I started having my doubts about quality of their engineers back when I saw the train wreck of IPC API (binder) they created for Android, which quite rightfully failed to pass the quality standard of Linux. And the coding guidelines they have made it into certainty—their engineers are nothing special.

topspin

@bulb
I once read their coding standards as I wanted to find something modern and not plagued by legacy craft, like most coding guides you find lying around somewhere. I thought they're certainly not as old-fashioned as, say, what MS would have with all the backwards compatibility they need to care about.
Well, turns out I was wrong, they were pretty much horrible.
I can't remember any details other than "don't use exceptions". And while there can certainly be some reasons not to*, theirs essentially was "we've not done before and your coworkers don't understand it".

*IIRC, the LLVM team also does that, but their reasons are "performance", and I assume they know what they're talking about.

Bulb

@topspin said in My web app died from performance bankruptcy:

I can't remember any details other than "don't use exceptions"

Don't use exceptions, don't use streams, don't use boost, don't use … well, essentially anything that makes C++ worth bothering with. And yes, mostly because “your coworkers wouldn't understand it”.

Also, there are many, many places in the standard library where the only way to report errors is via exceptions, both by the standard library itself and by functors passed to algorithms. So avoiding them means you can't use large portions of that.

topspin

@bulb I usually argue that you can't use the standard library at all (or very little of it) unless you're prepared to at least handle bad_alloc. But then I'm reminded that only works on Windows anyway.
On Linux you get: "I see you're trying to write robust software? Sure, I'll just go ahead and OOM kill that process for you. After I killed your browser. "

LB_

@topspin https://www.linkedin.com/pulse/20140503193653-3046051-why-google-style-guide-for-c-is-a-deal-breaker

dkf

@bulb said in My web app died from performance bankruptcy:

So avoiding them means you can't use large portions of that.

But there's a lot of overhead from them. Alas. It doesn't matter too much in some contexts, but in others it is really painful; embedded environments really don't have much in the way of provided runtime, and that which they have is often pretty weird. The core issue is that there's a need for codebursts for generating exceptions, for managing the memory associated with them, and dealing with all the places where exceptions have to be held up en route so that resources can be cleaned up. (Just because the compiler can do the bookkeeping for it all for you doesn't mean that the machine code isn't required.)

But Google's rules sound to me more like someone senior just doesn't like exceptions and everyone else has to follow along (in part because everyone else is following along). A story we've heard before elsewhere too…

dkf

@topspin said in My web app died from performance bankruptcy:

handle bad_alloc

You can get it on Linux… but it is much more likely that the process will get a signal instead. Good luck handling that if you've just exceeded your stack allocation.

Because fuck you, that's why.

topspin

@dkf
Hmm, interesting. Can you elaborate how this works and if it's at least a bit helpful?

Assume I got this scenario:
My application is for a very narrow audience of "expert" users and the amount of memory it will use directly depends on settings the user enters. So the user can always create a situation that'll be the moral equivalent of malloc(10TB), just not necessarily in one chunk. I can't exactly compute the amount before-hand and even if I could, portably checking how much free memory is left seems to be a path to insanity (I haven't looked into it very deeply though, if anyone has some hints).
So the simple way to deal with it is the also theoretically pure one: just try and catch any errors. But oh well, this gives me a bad_alloc on windows, which I handle with a nice message to the user, and a disappearing application that got OOM-killed on Linux. And if I'm unlucky it kills firefox or X, too.

So, if I don't get the exception do I get a signal other than SIG_KILL? I should probably not be out of stack space, so maybe there's some reasonable action based on that.

Tsaukpaetra

@bulb said in My web app died from performance bankruptcy:

when I saw the train wreck of IPC API (binder) they created for Android, which quite rightfully failed to pass the quality standard of Linux.

ow. What does it say when you fail the quality standard of freaking Linux????

heterodox

@topspin said in My web app died from performance bankruptcy:

disappearing application that got OOM-killed on Linux

Hmmm. If it's for a small group of expert users and a global solution is feasible you could just disable the OOM killer with overcommit_memory=2 (never overcommit memory, just return NULL if you can't do it).

Otherwise, I would imagine there's some sort of "try to preallocate" function in the C STL or a syscall, but I can't recall one off the top of my head.

Tsaukpaetra

@heterodox said in My web app died from performance bankruptcy:

@topspin said in My web app died from performance bankruptcy:

disappearing application that got OOM-killed on Linux

Hmmm. If it's for a small group of expert users and a global solution is feasible you could just disable the OOM killer with overcommit_memory=2 (never overcommit memory, just return NULL if you can't do it).

Otherwise, I would imagine there's some sort of "try to preallocate" function in the C STL or a syscall, but I can't recall one off the top of my head.

I thought that was literally what malloc did? and if it failed, you don't get your memory?

Or am I remembering wrong?

Gąska

@bulb said in My web app died from performance bankruptcy:

Don't use exceptions, don't use streams, don't use boost, don't use … well, essentially anything that makes C++ worth bothering with.

To be fair, C++ streams are the worst API I've ever seen, incorporating every single bad idea in existence to build the most useless I/O library ever.

Gąska

@heterodox said in My web app died from performance bankruptcy:

Otherwise, I would imagine there's some sort of "try to preallocate" function in the C STL or a syscall, but I can't recall one off the top of my head.

Usefulness of such a function would be severely limited by existence of more than one process in the system at once.

dkf

@topspin said in My web app died from performance bankruptcy:

So the simple way to deal with it is the also theoretically pure one: just try and catch any errors. But oh well, this gives me a bad_alloc on windows, which I handle with a nice message to the user, and a disappearing application that got OOM-killed on Linux. And if I'm unlucky it kills firefox or X, too.

The problem is that if you've got a situation where the OOM killer steps in, you've reached the point where system stability is otherwise compromised. You've not hit a process limit on memory so much as the whole-system limit. That's a pretty bad place to be in, and the OOM killer is just trying to prevent a full-system crash. You're not supposed to hit that situation at all.

The correct thing to do is to set a per-process memory limit bearing in mind what else is on the system via ulimit (there's a few limits; the overall virtual memory limit is the key one here). That allows the C and C++ memory allocation systems to fail more gracefully, since it makes the “give me more memory” system calls fail in a defined way. (The gracefulness of your application in that situation is up to you.)

gwowen

@dkf said in My web app died from performance bankruptcy:

The problem is that if you've got a situation where the OOM killer steps in, you've reached the point where system stability is otherwise compromised.

This is absolutely true. On any system (Windows or Linux) with (large amounts of) swap space configured, by the time malloc() would return NULL, or new would throw std::bad_alloc, your system is thrashing so badly that killing the runaway process with extreme prejudice is essentially the only to return normal service.

Ironically, good programming practices are bad for you here - if process is continually allocating memory to cause your thrash, and it handles NULL/bad_alloc competently, it'll just sit there thrashing once its taken all the memory. If it fails to handle it, it'll just segfault or abort(), which is probably what you want if you ever require the system to be responsive again.

For systems without swap, then absolutely you want to turn the OOM killer off.

Bulb

@dkf said in My web app died from performance bankruptcy:

embedded environments really don't have much in the way of provided runtime, and that which they have is often pretty weird.

Yes, non-hosted embedded environments often have good reasons to avoid exceptions. But Google is not doing anything embedded.

@tsaukpaetra said in My web app died from performance bankruptcy:

What does it say when you fail the quality standard of freaking Linux?

That you can't maintain properly layering, can't design interface with simple, well defined semantics, or both. The Linux code might be a bit weird, but it does have quite strict separation of concerns and the APIs are designed to be orthogonal.

@gąska said in My web app died from performance bankruptcy:

To be fair, C++ streams are the worst API I've ever seen, incorporating every single bad idea in existence to build the most useless I/O library ever.

Streams are a train wreck, but they are the standard way of defining output format for custom types and new output sinks. You could write a better library—after all the standard library only has a handful of formatters that would be easy to reimplement—but that would not work well for integrating with third-party libraries. Better just wrap the stream train-wreck under some saner utilities and let it do its job.

PleegWat

@heterodox said in My web app died from performance bankruptcy:

@topspin said in My web app died from performance bankruptcy:

disappearing application that got OOM-killed on Linux

Hmmm. If it's for a small group of expert users and a global solution is feasible you could just disable the OOM killer with overcommit_memory=2 (never overcommit memory, just return NULL if you can't do it).

Otherwise, I would imagine there's some sort of "try to preallocate" function in the C STL or a syscall, but I can't recall one off the top of my head.

mmap with MAP_POPULATE seems promising. But as others said you probably don't want to be there.

topspin

@heterodox said in My web app died from performance bankruptcy:

@topspin said in My web app died from performance bankruptcy:

disappearing application that got OOM-killed on Linux

Hmmm. If it's for a small group of expert users and a global solution is feasible you could just disable the OOM killer with overcommit_memory=2 (never overcommit memory, just return NULL if you can't do it).

I've not actually heard any complaints from them about this, so either they've never run into this problem or are smart enough just not to do something which would use a ridiculous amount of memory. But if they did, it would probably be easier to tell them "don't do that" than to have them try convince their IT to turn off overcommit (if they have it on anyway). After all, you read everywhere that it's a bad idea, and IT guys certainly don't take advice from users telling them an external said "deactivate this weird feature on your system".
It just bothers me that I know it's there and that it gets handled gracefully on platforms that don't lie to me.

@tsaukpaetra said in My web app died from performance bankruptcy:

I thought that was literally what malloc did? and if it failed, you don't get your memory?

That's what its API contract says it's supposed to do. It's not what actually happens with overcommit on.

@gwowen said in My web app died from performance bankruptcy:

The problem is that if you've got a situation where the OOM killer steps in, you've reached the point where system stability is otherwise compromised.

This is absolutely true. On any system (Windows or Linux) with (large amounts of) swap space configured, by the time malloc() would return NULL, or new would throw std::bad_alloc, your system is thrashing so badly that killing the runaway process with extreme prejudice is essentially the only to return normal service.

It's not absolutely true. If I ask for a huge amount of memory on windows, it tells me no and that's it. Since in this case I don't ask for all of it up front but in large chunks, I get maybe half a minute of grinding, then I get bad_alloc and things return to normal. That's pretty sensible from a user point of view. Killing other shit around me isn't.

@dkf said in My web app died from performance bankruptcy:

The correct thing to do is to set a per-process memory limit bearing in mind what else is on the system via ulimit (there's a few limits; the overall virtual memory limit is the key one here). That allows the C and C++ memory allocation systems to fail more gracefully, since it makes the “give me more memory” system calls fail in a defined way. (The gracefulness of your application in that situation is up to you.)

That sounds like a solution. Unfortunately, one that I have to do manually before starting, instead of having a reasonable default of, say, no more memory per process than is installed in the whole system.

I still think that the whole concept of overcommit is TR here, getting yourself into a stituation where you promised a process a ton of memory and then struggling to figure out who is the culprit once the process actually checks in on your promise.
But hey, the gods of Unix say it's the only reasonable way to do things, so I guess I'm .

dkf

@topspin said in My web app died from performance bankruptcy:

That sounds like a solution. Unfortunately, one that I have to do manually before starting, instead of having a reasonable default of, say, no more memory per process than is installed in the whole system.

But that depends hugely on system policy. All you're really doing there is proving that you don't actually understand the issues. A process can't do this for itself precisely because it depends on knowledge of what is going on elsewhere; it's a global vs local problem. It's not just “no larger than physical memory” as there's usually other processes as well, and it depends on how much swap is present, etc. On the flip side, memory-mapped files can result in less memory being actually needed than the virtual size calculation would naïvely assume.

If you think it is simple, try making it work with applications like browsers that have very large numbers of processes under the covers. ;)

cvi

@topspin said in My web app died from performance bankruptcy:

I still think that the whole concept of overcommit is TR here, getting yourself into a stituation where you promised a process a ton of memory and then struggling to figure out who is the culprit once the process actually checks in on your promise.

It'd be more sensible if there were a way to signal the OOM killer that you're prepared to free up memory on short notice (or perhaps you tell upfront what memory can be reclaimed from out under you). I.e., instead of immediately going off and randomly murdering processes, (some) processes would get the chance to free up memory.

Might be difficult to pull off at the point where you actually run into the condition where you're that low on memory. Then again, on a modern system, setting aside a few pages of emergency memory isn't that unreasonable either (IMnon-expertO).

topspin

@dkf
"No larger than physical memory" (plus swap) was just a simple suggestion that surely should be an upper limit. You can't use more memory than that in the whole system, so why should you give a single process more than that? If I ask for 1PB of RAM, surely telling me that's not available is not too much to ask.

dkf

@topspin said in My web app died from performance bankruptcy:

You can't use more memory than that in the whole system

Unless your system is configured as a single-process monster, you'll be OOMed considerably before you hit that.

dkf

@cvi said in My web app died from performance bankruptcy:

Might be difficult to pull of at the point where you actually run into the condition where you're that low on memory.

That's the real problem; by the point you detect the possibility of failure, you're in severe danger of being unable to recover from it.

julianlam

Can we get back on topic now? I want to talk more about Chrome and passive event handlers.

Just kidding

julianlam

Frankly, I'm surprised that trusting apps to gracefully handle situations with limited memory is a workable solution at all. In my experience, it would be all too easy for an app to and continue to consume memory, hoping the OOM killer would pick something else to terminate.

Then again I work in web, where rules fly out the window, so I suppose it's not a surprise.

PleegWat

@cvi said in My web app died from performance bankruptcy:

or perhaps you tell upfront what memory can be reclaimed from out under you

Memory mapped files can be reclaimed without notice. Allowing mappings to actually go invalid without notice would require the using app to implement SIGSEGV, and I don't think mmap has a flag for it.

topspin

@dkf said in My web app died from performance bankruptcy:

@topspin said in My web app died from performance bankruptcy:

You can't use more memory than that in the whole system

Unless your system is configured as a single-process monster, you'll be OOMed considerably before you hit that.

I may be missing your point or you are missing mine.
Surely the amount of memory I can use depends on system-global things like other processes, I agree, which is why I said trying to detect that probably leads me to a path to insanity. But, say, I got a box with 8GB of RAM and 1GB of swap installed. If I now go ahead and ask (without checking because, well, I said I didn't query the system for details) for 40GB of memory, why don't I immediately get an OOM condition (malloc returns 0) instead of "sure, go ahead, I'll kill you when I'm ready"?
Because that's my point: I don't. Linux happily acts like it can provide those 40GB, then kills me for using them.

cvi

@pleegwat Yeah, as far as I'm aware, there's no way to flag this currently. In theory there could be a different signal from SIGSEGV that informs the program that some of the flagged areas were dropped (and perhaps which).

I have quite a bit of memory that's double-buffered (because of asynchronous stuff). If the OS dropped the double-buffered bits, I'd still have my data in the primary buffers (mostly, anyway).

Magus

I keep reading this thread as "My wife died from..." until I look closer...

TimeBandit

@magus said in My web app died from performance bankruptcy:

I keep reading this thread as "My wife died from..." until I look closer...

My wife died from performance bankruptcy

Bulb

@topspin said in My web app died from performance bankruptcy:

But, say, I got a box with 8GB of RAM and 1GB of swap installed. If I now go ahead and ask (without checking because, well, I said I didn't query the system for details) for 40GB of memory, why don't I immediately get an OOM condition (malloc returns 0)

And that's precisely what will happen. Malloc will return 0! At least under the default setting of vm.overcommit_memory=0:

documentation:

When this flag is 0, the kernel attempts to estimate the amount
of free memory left when userspace requests more memory.
[…]
The default value is 0.

And if you think the documentation is lying, no, it isn't:

bulb:/tmp$ cat test.c
#include <stdlib.h>
#include <stdio.h>

int main()
{
    void *p = malloc(40ll*1024*1024*1024);
    printf("Allocated pointer: %p\n", p);
    return p == NULL;
}
bulb:/tmp$ gcc test.c
bulb:/tmp$ ./a.out 
Allocated pointer: (nil)
bulb:/tmp$ echo $?
1

Yes, the default used to be to always overcommit, but that was, like, some time before the Flood.

@topspin said in My web app died from performance bankruptcy:

instead of "sure, go ahead, I'll kill you when I'm ready"?

Well, it does.

@topspin said in My web app died from performance bankruptcy:

Linux happily acts like it can provide those 40GB, then kills me for using them.

Apparently there are, or used to be, applications that happily requested precisely that and then happily proceeded to never actually access the memory!

Also this is still just an estimate, because due to memory mapping and sharing it is nearly impossible to actually calculate precisely. Copy-on-write upon fork() wreaks particularly bad havoc into it, because strictly speaking it should reserve all the space, but it is almost never used in practice.

PleegWat

@bulb Then again, I once locked up my dev VM simply by allocating memory.

In that case, I was allocating an rx_ring buffer (used for bulk receipt of network packets without system calls). Unswappable kernel memory. And while testing stuff I accidentally requested 8GB of it, on a VM with 6GB of ram.

topspin

@bulb
Interesting, I was pretty sure I've tried exactly that before.
The closest Linux box I can ssh into has 130GB of RAM, so I replaced the 40 with 400 and actually got the 0 pointer returned as you said. However, to make sure I'm not completely hallucinating, when I replaced it with a for loop of 40GB chunks, I got this still disappointing result:

topspin@RH7:/tmp$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
  void* p[10] = {0};
  long long len = 40ll*1024*1024*1024; 
  for (int i = 0; i < 10; i++)
  {
    p[i] = malloc(len);
    printf("#%d Allocated pointer: %p\n", i, p[i]);
    fflush(stdout);
  }
  for (int i = 0; i < 10; i++)
  {
    printf("#%d touching memory at: %p\n", i, p[i]);
    fflush(stdout);
    memset(p[i], '-', len);
  }
  return 0;
}
topspin@RH7:/tmp$ gcc -std=c99 test.c
topspin@RH7:/tmp$ ./a.out
#0 Allocated pointer: 0x7fe46e330010
#1 Allocated pointer: 0x7fda6e32f010
#2 Allocated pointer: 0x7fd06e32e010
#3 Allocated pointer: 0x7fc66e32d010
#4 Allocated pointer: 0x7fbc6e32c010
#5 Allocated pointer: 0x7fb26e32b010
#6 Allocated pointer: 0x7fa86e32a010
#7 Allocated pointer: 0x7f9e6e329010
#8 Allocated pointer: 0x7f946e328010
#9 Allocated pointer: 0x7f8a6e327010
#0 touching memory at: 0x7fe46e330010
#1 touching memory at: 0x7fda6e32f010
#2 touching memory at: 0x7fd06e32e010
Killed

And the best part is: I run your original program on macOS (8GB Air) and I get this:

topspin@Air:/tmp$ ./a.out ; echo "return code $?"
Allocated pointer: 0x112d91000
return code 0

So at least the Linux situation is better than mac.

Tsaukpaetra

@pleegwat said in My web app died from performance bankruptcy:

@bulb Then again, I once locked up my dev VM simply by allocating memory.

In that case, I was allocating an rx_ring buffer (used for bulk receipt of network packets without system calls). Unswappable kernel memory. And while testing stuff I accidentally requested 8GB of it, on a VM with 6GB of ram.

That reminds me, I did a similar thing by trying to boot up a VM with not enough memory available. Apparently the hypervisor version I was using allocated unswappable kernel memory (or something) but there wasn't enough I guess and the kernel panicked.

PleegWat

@tsaukpaetra I keep meaning to try if the rx_ring thing works from a non-privileged user. I don't think the rx_ring itself has special privilege requirements on top of the underlying socket.

pie_flavor

@dkf said in My web app died from performance bankruptcy:

@el_heffe said in My web app died from performance bankruptcy:

What good is performance if the features are broken and don't work properly?

With performance, the failures can be described via interpretative dance.

And it'd still be less convoluted than most website code.

pie_flavor

@robfreundlich said in My web app died from performance bankruptcy:

@julianlam said in My web app died from performance bankruptcy:

As a user, I certainly do not care about “being part of moving the web forward aggressively”. Why should I? I like my stuff working, not broken. Nobody ever wants it the other way around.

This is exactly how I feel as a user and a developer. Nicely said!

I like aggressive moving and breaking code, just not breaking the ability to use the code. Like C#. Java moves forward and breaks classloading; people scream at them. C# can break anything they want to, because you can have all the .NET versions installed at once. Here, Chrome only has the latest version of its implementations, and all code runs on them. If that's your model, (a) that's painful because you can't break anything, and (b) DON'T BREAK THINGS.

dkf

@pie_flavor said in My web app died from performance bankruptcy:

C# can break anything they want to, because you can have all the .NET versions installed at once.

I seem to have all the patchlevels of Java installed on this machine!

pie_flavor

@dkf Yeah, but they don't actually run based on program version; it just selects the latest one.

PleegWat

@pie_flavor You can run the java binary from whichever installation you want to use. You may also have to set JAVA_HOME to the same jre; I'm not sure.

pie_flavor

@pleegwat Oh, you can.. But do you? Not unless it's manual. There is one PATH'd java.exe, one JAVA_HOME. You can change these whenever you want, configure JAVA_HOME per process, manually call JVM binaries by explicit path, but to any regular user it's pretty damn complicated, and requires hefty fnagling on either their end or yours.

quijibo

@pie_flavor That's why Sun invented Java Web Start :)

pie_flavor

@quijibo said in My web app died from performance bankruptcy:

@pie_flavor That's why Sun invented Java Web Start :)

oh god OH GOD GET IT AWAY FROM MEEEEEEEEEE

Tsaukpaetra

@quijibo said in My web app died from performance bankruptcy:

@pie_flavor That's why Sun invented Java Web Start :)

A week ago I learned it's called "ice" on Linux.