Bad Gateway *SPANK SPANK*



  • OK, I'm genuinely curious :wtf: is going on here. It looks like every other day I try to read this forum, just for it to randomly shit itself with a "502 Bad Gateway".

    0_1468716666599_502.png

    @ben_lubar ?



  • @Deadfast said in Bad Gateway *SPANK SPANK*:

    OK, I'm genuinely curious :wtf: is going on here. I

    Me too. Fuck. None of it makes sense.



  • @Deadfast said in Bad Gateway *SPANK SPANK*:

    OK, I'm genuinely curious is going on here.

    If you look at the cooties thread, you'll see some downloads of tgz files. They include stack trace analyses. There's no fucking pattern. Sometimes, node just gets stuck, AFAICT. There's some lower level thing going on here, but damned if I know what it is.



  • A man with parkinssons was holding the mains leads to the relevant systems into a loose socket.



  • @boomzilla we do seem to have an awful lot of manual labor involved in restarting node. Every 188 hrs is just too damn often.
    Edit:I just checked the thread, it's closer to 3 times a day,Shirley it can't be for sirius?



  • @ben_lubar isn't it a oneliner to restart the dockers? are you having trouble to detect cooties?

    maybe if you guys explain it we can think on something



  • @fbmac i don't think detecting cooties is a problem (@accalia has that nailed down) , but they (the admins) have no clue as to what the root cause is, that's worrying.



  • @fbmac said in Bad Gateway *SPANK SPANK*:

    isn't it a oneliner to restart the dockers?

    It's just docker restart wtdwtf-nodebb

    But then you have to wait for the software to boot back up.

    Alternatively, if only one of the two instances is stuck, you can sudo kill -9 it and hope that the manager instance will notice that one of the two server instances died.

    But if the manager instance is killed, the docker container restarts after a timeout. The timeout starts at 100ms and doubles every restart until the container has been running continuously for 10 seconds.



  • @swayde said in Bad Gateway *SPANK SPANK*:

    @boomzilla we do seem to have an awful lot of manual labor involved in restarting node. Every 188 hrs is just too damn often.
    Edit:I just checked the thread, it's closer to 3 times a day,Shirley it can't be for sirius?

    What the fuck, I just saw that too (somehow I haven't seen that thread before). This is ridiculous...



  • @boomzilla said in Bad Gateway *SPANK SPANK*:

    If you look at the cooties thread, you'll see some downloads of tgz files. They include stack trace analyses.

    SadlyLuckily I don't do web so none of it means anything to me.



  • @ben_lubar you could use vmstat to detect when the cpu is spending 0% time on idle, and restart automatically.



  • OH GOD WHAT HAVE I CREATED

    gdb in production?!


  • Impossible Mission Players - A

    @ben_lubar said in Bad Gateway *SPANK SPANK*:

    OH GOD WHAT HAVE I CREATED

    gdb in production?!

    Nothing wrong with watching your brain work, well unless you're watching your brain watching your brain watching your brain.......


  • Discourse touched me in a no-no place

    @swayde said in Bad Gateway *SPANK SPANK*:

    Every 8 hrs is just too damn often.

    If we switch to a product that uses Rails it won't be too often, or so I understand.



  • @Deadfast said in Bad Gateway *SPANK SPANK*:

    OK, I'm genuinely curious is going on here

    We're using shit software. We used to use shit and unsupported but stable software, then we migrated to utterly shit and utterly unstable software which was supported as long as you were running the beta version, and now we're running nodeBB. What's to be curious about?


  • Winner of the 2016 Presidential Election

    On Topic:

    0_1468745885900_Screenshot 2016-07-17 10.35.57.png



  • @ben_lubar said in Bad Gateway *SPANK SPANK*:

    OH GOD WHAT HAVE I CREATED

    gdb in production?!

    Is this thing deployed? Because if it is, it doesn't seem to be effective :(.



  • @Deadfast said in Bad Gateway *SPANK SPANK*:

    Is this thing deployed?

    Nope. I didn't want to deploy it while I was tired.



  • @ben_lubar said in Bad Gateway *SPANK SPANK*:

    OH GOD WHAT HAVE I CREATED
    ...

    gdb in production?!

    @Deadfast said in Bad Gateway *SPANK SPANK*:

    Is this thing deployed?

    @ben_lubar said in Bad Gateway *SPANK SPANK*:

    Nope. I didn't want to deploy it while I was tiredsober.

    Seems like the appropriate attitude; it's not the sort of thing a sober person would do. ;)



  • @HardwareGeek said in Bad Gateway *SPANK SPANK*:

    @ben_lubar said in Bad Gateway *SPANK SPANK*:

    Nope. I didn't want to deploy it while I was tiredsober.

    Seems like the appropriate attitude; it's not the sort of thing a sober person would do. ;)

    You're going to be waiting a long time on that one. I've never been drunk in my life.


  • Winner of the 2016 Presidential Election

    @ben_lubar said in Bad Gateway *SPANK SPANK*:

    I've never been drunk in my life.

    Denial is a well-known symptom of alcoholism. :trolleybus:



  • @asdf said in Bad Gateway *SPANK SPANK*:

    @ben_lubar said in Bad Gateway *SPANK SPANK*:

    I've never been drunk in my life.

    Denial is a well-known symptom of alcoholism. :trolleybus:

    s/my life/real life/

    I'm a member of the Eldvin Monastery Brew of the Month Club and I've played Dwarf Fortress, and a lot of the other games I've played (especially MMOs) have had alcohol, but no alcohol in real life for me.


  • area_can

    @ben_lubar is it possible to ferment plump helmets?



  • @ben_lubar said in Bad Gateway *SPANK SPANK*:

    gdb in production?!

    Huh...so it will give us a stack dump in the log and then kill the process?



  • @boomzilla better than what we were asking for then

    good job @ben_lubar



  • @boomzilla but these bugs are in the javascript code or nodejs itself? would gdb stack be really useful?



  • @fbmac We've been getting stack traces that include the javascript portions (using linux perf tools) and there isn't any obvious pattern there that I can see. I am nearly certain that there's something more low level going on here. There are multiple file downloads in the server cooties tracking log (the pinned topic in Meta) where you can take a look at some of the diagnostics we've run.

    Personally, I was having problems with vmware occasionally freezing and using 100% CPU. I eventually tracked it down to needing to turn off transparent hugepages in the kernel. I suspect it's something similar here, though definitely some interaction with node itself, because it's only ever the node processes (i.e., not mongo or redis or anything like that) that becomes pegged at 100% CPU.

    But we know that, for instance, netflix runs node on AWS, so I'd think a problem like this would have been noticed by someone like that if it was specific to node / AWS combinations, though maybe they run a different enough class of server so they're not affected. At this point, giving up and putting in an auto killer is sounding pretty good to me, because I'm out of ideas.


  • Winner of the 2016 Presidential Election

    *insert Kermit the Frog sipping tea meme saying "I've been saying this for months but that's none of my business"*


  • :belt_onion:

    @boomzilla said in Bad Gateway *SPANK SPANK*:

    @fbmac We've been getting stack traces that include the javascript portions (using linux perf tools) and there isn't any obvious pattern there that I can see. I am nearly certain that there's something more low level going on here. There are multiple file downloads in the server cooties tracking log (the pinned topic in Meta) where you can take a look at some of the diagnostics we've run.

    Personally, I was having problems with vmware occasionally freezing and using 100% CPU. I eventually tracked it down to needing to turn off transparent hugepages in the kernel. I suspect it's something similar here, though definitely some interaction with node itself, because it's only ever the node processes (i.e., not mongo or redis or anything like that) that becomes pegged at 100% CPU.

    But we know that, for instance, netflix runs node on AWS, so I'd think a problem like this would have been noticed by someone like that if it was specific to node / AWS combinations, though maybe they run a different enough class of server so they're not affected. At this point, giving up and putting in an auto killer is sounding pretty good to me, because I'm out of ideas.

    Are these stack traces somewhere public we can read too?



  • @dse said in Bad Gateway *SPANK SPANK*:

    Are these stack traces somewhere public we can read too?

    there appear to be links to stuff at the tail end of the first post at https://what.thedailywtf.com/topic/19990/the-official-server-cooties-tracking-thread


  • :belt_onion:

    @boomzilla said in Bad Gateway *SPANK SPANK*:

    different enough class of server

    Or different enough set of plugins. Is it common to have so many Go extensions in node?


  • Winner of the 2016 Presidential Election

    @dse said in Bad Gateway *SPANK SPANK*:

    Is it common to have so many Go extensions in node?

    :wtf:



  • @error said in Bad Gateway *SPANK SPANK*:

    @dse said in Bad Gateway *SPANK SPANK*:

    Is it common to have so many Go extensions in node?

    :wtf:

    Some of my plugins are compiled with GopherJS, like htmlcleaner, which replaces the htmlsanitizer plugin that used regular expressions all over.



  • So...first automated restart that I noticed. Everything with symbols seems to be waiting on thread planets to align. Thread 1 has no symbols. I'm guessing that's JITed javascript, and would look like any of many samples we've taken.

    0_1468805317169_cooties.txt



  • @boomzilla said in Bad Gateway *SPANK SPANK*:

    So...first automated restart that I noticed.

    There were three by my count:

    46/4567 timed out
    2016-07-17T21:06:15,295403776+0000
    [New LWP 63]
    [New LWP 62]
    [New LWP 61]
    [New LWP 60]
    [New LWP 50]
    [New LWP 49]
    [New LWP 48]
    [New LWP 47]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    0x00003a6f69fbbf60 in ?? ()
    
    Thread 9 (Thread 0x7f1eb725c700 (LWP 47)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f1eb76100a4 in start_thread (arg=0x7f1eb725c700) at pthread_create.c:309
    #6  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 8 (Thread 0x7f1eb6a5b700 (LWP 48)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f1eb76100a4 in start_thread (arg=0x7f1eb6a5b700) at pthread_create.c:309
    #6  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 7 (Thread 0x7f1eb625a700 (LWP 49)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f1eb76100a4 in start_thread (arg=0x7f1eb625a700) at pthread_create.c:309
    #6  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 6 (Thread 0x7f1eb5a59700 (LWP 50)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f1eb76100a4 in start_thread (arg=0x7f1eb5a59700) at pthread_create.c:309
    #6  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 5 (Thread 0x7f1ea7fff700 (LWP 60)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f1eb76100a4 in start_thread (arg=0x7f1ea7fff700) at pthread_create.c:309
    #5  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 4 (Thread 0x7f1ea77fe700 (LWP 61)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f1eb76100a4 in start_thread (arg=0x7f1ea77fe700) at pthread_create.c:309
    #5  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 3 (Thread 0x7f1ea6ffd700 (LWP 62)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f1eb76100a4 in start_thread (arg=0x7f1ea6ffd700) at pthread_create.c:309
    #5  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 2 (Thread 0x7f1ea67fc700 (LWP 63)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f1eb76100a4 in start_thread (arg=0x7f1ea67fc700) at pthread_create.c:309
    #5  0x00007f1eb734587d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 1 (Thread 0x7f1eb8664740 (LWP 46)):
    #0  0x00003a6f69fbbf60 in ?? ()
    #1  0xffffffffffff4bb0 in ?? ()
    #2  0xffffffffffffe55d in ?? ()
    #3  0xffffffffffff7fe9 in ?? ()
    #4  0xffffffffffff4bd7 in ?? ()
    #5  0xffffffffffff4bb0 in ?? ()
    #6  0xffffffffffff4bb1 in ?? ()
    #7  0xffffffffffff4bb0 in ?? ()
    #8  0x0000000000000000 in ?? ()
    
    51/4568 timed out
    2016-07-17T21:06:22,042203489+0000
    [New LWP 59]
    [New LWP 58]
    [New LWP 57]
    [New LWP 56]
    [New LWP 55]
    [New LWP 54]
    [New LWP 53]
    [New LWP 52]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    0x00000705232cb4b4 in ?? ()
    
    Thread 9 (Thread 0x7fb9df4de700 (LWP 52)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007fb9df8920a4 in start_thread (arg=0x7fb9df4de700) at pthread_create.c:309
    #6  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 8 (Thread 0x7fb9decdd700 (LWP 53)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007fb9df8920a4 in start_thread (arg=0x7fb9decdd700) at pthread_create.c:309
    #6  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 7 (Thread 0x7fb9de4dc700 (LWP 54)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007fb9df8920a4 in start_thread (arg=0x7fb9de4dc700) at pthread_create.c:309
    #6  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 6 (Thread 0x7fb9ddcdb700 (LWP 55)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007fb9df8920a4 in start_thread (arg=0x7fb9ddcdb700) at pthread_create.c:309
    #6  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 5 (Thread 0x7fb9cffff700 (LWP 56)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007fb9df8920a4 in start_thread (arg=0x7fb9cffff700) at pthread_create.c:309
    #5  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 4 (Thread 0x7fb9cf7fe700 (LWP 57)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007fb9df8920a4 in start_thread (arg=0x7fb9cf7fe700) at pthread_create.c:309
    #5  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 3 (Thread 0x7fb9ceffd700 (LWP 58)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007fb9df8920a4 in start_thread (arg=0x7fb9ceffd700) at pthread_create.c:309
    #5  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 2 (Thread 0x7fb9ce7fc700 (LWP 59)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007fb9df8920a4 in start_thread (arg=0x7fb9ce7fc700) at pthread_create.c:309
    #5  0x00007fb9df5c787d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 1 (Thread 0x7fb9e08e6740 (LWP 51)):
    #0  0x00000705232cb4b4 in ?? ()
    #1  0x000000000000001b in ?? ()
    #2  0x0000000000000b4a in ?? ()
    #3  0x0000000000000000 in ?? ()
    
    3233/4567 timed out
    2016-07-18T01:18:55,322306504+0000
    [New LWP 3241]
    [New LWP 3240]
    [New LWP 3239]
    [New LWP 3238]
    [New LWP 3237]
    [New LWP 3236]
    [New LWP 3235]
    [New LWP 3234]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    0x00001ca7c9b0718c in ?? ()
    
    Thread 9 (Thread 0x7f4de836b700 (LWP 3234)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f4de871f0a4 in start_thread (arg=0x7f4de836b700) at pthread_create.c:309
    #6  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 8 (Thread 0x7f4de7b6a700 (LWP 3235)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f4de871f0a4 in start_thread (arg=0x7f4de7b6a700) at pthread_create.c:309
    #6  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 7 (Thread 0x7f4de7369700 (LWP 3236)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f4de871f0a4 in start_thread (arg=0x7f4de7369700) at pthread_create.c:309
    #6  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 6 (Thread 0x7f4de6b68700 (LWP 3237)):
    #0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
    #1  0x0000000000fc8188 in v8::base::Semaphore::Wait() ()
    #2  0x0000000000e65949 in v8::platform::TaskQueue::GetNext() ()
    #3  0x0000000000e65a9c in v8::platform::WorkerThread::Run() ()
    #4  0x0000000000fc9140 in v8::base::ThreadEntry(void*) ()
    #5  0x00007f4de871f0a4 in start_thread (arg=0x7f4de6b68700) at pthread_create.c:309
    #6  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 5 (Thread 0x7f4de5266700 (LWP 3238)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f4de871f0a4 in start_thread (arg=0x7f4de5266700) at pthread_create.c:309
    #5  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 4 (Thread 0x7f4de4a65700 (LWP 3239)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f4de871f0a4 in start_thread (arg=0x7f4de4a65700) at pthread_create.c:309
    #5  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 3 (Thread 0x7f4dd7fff700 (LWP 3240)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f4de871f0a4 in start_thread (arg=0x7f4dd7fff700) at pthread_create.c:309
    #5  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 2 (Thread 0x7f4dd77fe700 (LWP 3241)):
    #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    #1  0x0000000000fbe709 in uv_cond_wait (cond=cond@entry=0x19aa020 <cond>, mutex=mutex@entry=0x19a9fe0 <mutex>) at ../deps/uv/src/unix/thread.c:380
    #2  0x0000000000faf5d8 in worker (arg=arg@entry=0x0) at ../deps/uv/src/threadpool.c:75
    #3  0x0000000000fbe269 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
    #4  0x00007f4de871f0a4 in start_thread (arg=0x7f4dd77fe700) at pthread_create.c:309
    #5  0x00007f4de845487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 1 (Thread 0x7f4de9773740 (LWP 3233)):
    #0  0x00001ca7c9b0718c in ?? ()
    #1  0xffffffffffffba4a in ?? ()
    #2  0xfffffffffffff733 in ?? ()
    #3  0xffffffffffffc3b0 in ?? ()
    #4  0xffffffffffffbd72 in ?? ()
    #5  0xffffffffffffba4a in ?? ()
    #6  0xffffffffffffba4b in ?? ()
    #7  0xffffffffffffba4a in ?? ()
    #8  0x0000000000000000 in ?? ()
    


  • @ben_lubar said in Bad Gateway *SPANK SPANK*:

    There were three by my count:

    They all look the same to me.



  • @boomzilla So if I'm looking at this right, it's getting deadlocked?


  • Impossible Mission Players - A

    @Deadfast said in Bad Gateway *SPANK SPANK*:

    @boomzilla So if I'm looking at this right, it's getting deadlocked?

    Yes, that's the gist of it, but it's not being very informative what it's deadlocked on.



  • @Deadfast said in Bad Gateway *SPANK SPANK*:

    So if I'm looking at this right, it's getting deadlocked?

    I don't think so. Would that result in 100% CPU usage? When you wait for a semaphore or whatever, that doesn't use CPU, does it?



  • @boomzilla said in Bad Gateway *SPANK SPANK*:

    I'm guessing that's JITed javascript

    As long as we're installing GDB on a production server, we may as well compile our own NodeJS that has the --gdb flag applied at compile time so V8 outputs symbols for JITed code.

    Wheeeeeeeee


  • Impossible Mission Players - A

    @boomzilla said in Bad Gateway *SPANK SPANK*:

    that doesn't use CPU, does it?

    It does if you're not actually waiting, but I'm not a Linux dev, I can only cite the general difference between polling and event listening...



  • @boomzilla said in Bad Gateway *SPANK SPANK*:

    @Deadfast said in Bad Gateway *SPANK SPANK*:

    So if I'm looking at this right, it's getting deadlocked?

    I don't think so. Would that result in 100% CPU usage? When you wait for a semaphore or whatever, that doesn't use CPU, does it?

    Hm, I don't think it would. I'm guessing the remaining threads are waiting on thread #1 to finish whatever the hell it's doing. I guess you could check htop to confirm.



  • 0_1468812523885_upload-537b9acb-e547-4f41-bd1a-1bdb7f6735fb

    Only 40 minutes to build nodejs. Nice.


  • :belt_onion:



  • root@004cf0a2d661:/usr/src/app# node --gdbjit app.js
    18/7 04:52 [1692] - info: NodeBB v1.1.0 Copyright (C) 2013-2014 NodeBB Inc.
    18/7 04:52 [1692] - info: This program comes with ABSOLUTELY NO WARRANTY.
    18/7 04:52 [1692] - info: This is free software, and you are welcome to redistribute it under certain conditions.
    18/7 04:52 [1692] - info: 
    18/7 04:52 [1692] - info: Time: Mon Jul 18 2016 04:52:50 GMT+0000 (UTC)
    18/7 04:52 [1692] - info: Initializing NodeBB v1.1.0
    18/7 04:52 [1692] - warn: You have no mongo password setup!
    18/7 04:52 [1692] - info: [database] Checking database indices.
    Segmentation fault (core dumped)
    

    ugh



  • @Deadfast that sucks, the whole point of node is to avoid crap like that.

    That also tells me the bug is probably in one of the native modules.


  • BINNED

    @boomzilla said in Bad Gateway *SPANK SPANK*:

    Would that result in 100% CPU usage?

    That's more of a livelock issue. If there's some incredibly bad code it could be doing something like:

    while (thing.isLocked)
    {
    //sleep is for the weak
    }
    
    thing.DoStuff()
    

  • Discourse touched me in a no-no place

    @boomzilla said in Bad Gateway *SPANK SPANK*:

    Would that result in 100% CPU usage? When you wait for a semaphore or whatever, that doesn't use CPU, does it?

    For a normal semaphore, the thread is indeed shut down while waiting. For a spinlock though…



  • @ben_lubar is it possible to disable jit?
    if it isn't using much cpu it may be doing more harm than good, in a buggy nodejs



  • I'd guess that this is not a deadlock, and would rather guess that the JS code is stuck somewhere. The threads that the back traces are visible for seem to be part of thread pools (either of V8 or of libuv), and I'd guess waiting for their next task.

    Which makes sense, since @ben_lubar's script kills the server after waiting for 5s, so it makes sense for any queued asynch tasks to have completed by then.

    tl;dr: problem is most likely in the thread for which there is no stack trace.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.