502 Gateway Errors Are Getting OLD
-
Seriously, I DO NOT remember Discourse being down as often as the NodeBB forums are.
How is this so difficult? If NodeBB can't handle the load (huh, it's almost as if JS wasn't fucking meant to be server-side, go figure), then we need to switch to something STABLE.It is getting ... tiresome ... that about 1 out of every 3 attempts I make to the forums, timeout with a bad gateway.
Could we get this shit sorted out? It is embarrassing for a community of supposed professionals. If it is a technical problem, put it out to the community, and we can probably solve it. If it is a resources problem ... we really should have looked at the resource requirements of the chosen platform before deploying to it, but maybe we can throw some money at it? If it is just NodeBB shitting the bed, then I think we are fast approaching a vote of no confidence time.
-
@Vaire said in 502 Gateway Errors Are Getting OLD:
If it is a technical problem, put it out to the community, and we can probably solve it.
I think the main issue is that it's freaking JS and debugging NodeJS code is something of a PITA. That's the impression I got from @ben_lubar's responses on IRC at least.
-
@Vaire You don't remember the daily Discourse cootiestorm that would happen around lunchtime in the US?
-
@Vaire said in 502 Gateway Errors Are Getting OLD:
It is getting ... tiresome ... that about 1 out of every 3 attempts I make to the forums, timeout with a bad gateway.
I think the problem is you. I almost never have cooties when I try to use the site.
What's "interesting" about the NodeBB issues is that it's almost completely good or totally down. Discourse had its definite down moments plus it just always sucked.
-
@Onyx said in 502 Gateway Errors Are Getting OLD:
@Vaire said in 502 Gateway Errors Are Getting OLD:
If it is a technical problem, put it out to the community, and we can probably solve it.
I think the main issue is that it's freaking JS and debugging NodeJS code is something of a PITA. That's the impression I got from @ben_lubar's responses on IRC at least.
Why are we debugging it at all? Why wasn't it fully stable before we switched to it? We should, at the very least, have been running a full Test environment of it, and only switched from our old (shitty, but stable) forum when the new solution was stable.
-
@Yamikuronue the servercooties were usually shorter lived than these bouts of 502 Gateway Errors.
-
@Yamikuronue said in 502 Gateway Errors Are Getting OLD:
@Vaire You don't remember the daily Discourse cootiestorm that would happen around lunchtime in the US?
No. I tend to log in at the start of the day, read a few, post to a few, and then the day fully starts and I am out. Sometimes I pop back in near the end of the day, but I pretty much never on between 11AM-1PM US West Coast time.
-
@Vaire we did. There was plenty of content on it (partial import), but only about 10ish (active) users.
It worked fine, whatever is killing it now we didn't manage to do it there.
-
@Onyx said in 502 Gateway Errors Are Getting OLD:
@Vaire we did. There was plenty of content on it (partial import), but only about 10ish (active) users.
It worked fine, whatever is killing it now we didn't manage to do it there.
So ... no stress testing was done, then?
-
@Vaire said in 502 Gateway Errors Are Getting OLD:
So ... no stress testing was done, then?
It would have stressed MilwaukeePC's bandwidth before it would have stressed the test server.
-
Eh, still better than Discourse. NodeBB is down a few minutes at a time for whatever reason and then comes back pretty quickly. Discourse was a superposition of online and offline, 100% of the time, and never achieved a discrete eigenstate when observed or used.
CTRL-F5 to reply is still pretty annoying.
-
I obviously wasn't tested very hard, because otherwise the utterly horrible UI retardathon we're suffering now should have been instantly obvious.
Obviously, I count myself in those who failed to test, but on the other hand, I'll also point out I've been suggesting using phpBB or other "mature" forum system since the early days of Discourse.
@boomzilla said in 502 Gateway Errors Are Getting OLD:
I think the problem is you. I almost never have cooties when I try to use the site.
WORKS_FOR_ME | WONT_FIX | DOING_IT_WRONG | CLOSE_AND_BAN
-
@mott555 said in 502 Gateway Errors Are Getting OLD:
CTRL-F5 to reply is still pretty annoying.
That's a separate issue, unrelated to server-side stuff. Basically the composer fails to load for whatever reason and never retries or shows an error message.
-
@mott555 said in 502 Gateway Errors Are Getting OLD:
Eh, still better than Discourse. NodeBB is down a few minutes at a time for whatever reason and then comes back pretty quickly.
Or for a couple hours randomly. Twice last week, in the same day.
-
@Magus said in 502 Gateway Errors Are Getting OLD:
@mott555 said in 502 Gateway Errors Are Getting OLD:
Eh, still better than Discourse. NodeBB is down a few minutes at a time for whatever reason and then comes back pretty quickly.
Or for a couple hours randomly. Twice last week, in the same day.
True story.
-
@boomzilla said in 502 Gateway Errors Are Getting OLD:
What's "interesting" about the NodeBB issues is that it's almost completely good or totally down.
The other problem is when it's down, none of the AJAX requests on the damned site has a goddamned timeout so there's no way to TELL it's down short of opening up a new tab and trying to load it.
-
@Vaire said in 502 Gateway Errors Are Getting OLD:
Seriously, I DO NOT remember Discourse being down as often as the NodeBB forums are.
absence makes the heart grow fonder.
discourse may not have gone 502 as often, but errors, and generally crappy performance was a daily, if not constant, thing.
-
@accalia Agreed, for all it's problems, this thing is faster and is more consistent in general somehow.
-
@accalia Yeah well I don't have a heart and I hate this.
-
I haven't been experiencing NodeBB being down much, but the front page of TDWTF has down over 50% of the time for me nowadays.
-
@LB_ said in 502 Gateway Errors Are Getting OLD:
I haven't been experiencing NodeBB being down much, but the front page of TDWTF has down over 50% of the time for me nowadays.
That's not something we can blame on either NodeBB or Discourse. They use their own y software…
-
Discourse, when it worked, was still like trying to run uphill through mud.
Like @boomzilla said, it's weird that NodeBB is so bipolar.
-
@anotherusername said in 502 Gateway Errors Are Getting OLD:
it's weird that NodeBB is so bipolar
I suspect that something is blowing the cache (or even memory) from time to time, and that's making NBB blow chunks as stuff that it assumes to be fast becomes glacial. Once you start hitting timeouts, the problem spirals out of control as the effects spread.
What we need is something that tracks memory usage of each key server process through one of these outages (CPU too and disk usage, if people want, but I think those won't matter), and correlates it with what external traffic is happening (including in the period leading up to the collapse). My pet theory is that it is a spider of some kind that is using a non-standard access pattern, but I've no real evidence either way on that; it's just a hunch…
-
@dkf it might just be confirmation bias, but it seems like the downages and uppages happen at times that aren't exactly random. Like... down at 10 minutes before the top of the hour, up at the top of the hour. Or it's down for exactly 5 minutes. Or something.
-
@mott555 said in 502 Gateway Errors Are Getting OLD:
NodeBB is down a few minutes at a time for whatever reason and then comes back pretty quickly.
Hours if benlubar isn't awake at that time ...
-
@anotherusername said in 502 Gateway Errors Are Getting OLD:
it might just be confirmation bias
I ought to tell my nagios to keep an eye on things. ;) Except other people have access to that, which could make for a somewhat awkward few minutes of explanation…
-
@dkf said in 502 Gateway Errors Are Getting OLD:
I suspect that something is blowing the cache (or even memory) from time to time, and that's making NBB blow chunks as stuff that it assumes to be fast becomes glacial. Once you start hitting timeouts, the problem spirals out of control as the effects spread.
Earlier in the week, it seems like it was some sort of feedback loop when iframely would try to onebox a post from here. This site has been blacklisted from that for now, so the current short bouts of cooties haven't been that.
-
@Magus said in 502 Gateway Errors Are Getting OLD:
@mott555 said in 502 Gateway Errors Are Getting OLD:
Eh, still better than Discourse. NodeBB is down a few minutes at a time for whatever reason and then comes back pretty quickly.
Or for a couple hours randomly. Twice last week, in the same day.
If you're referring to the day when @PJH and I were debugging the downed forum to see if we could figure out why, then you are correct that intentionally keeping the forum down means the forum is down.
-
@dkf said in 502 Gateway Errors Are Getting OLD:
What we need is something that tracks memory usage of each key server process through one of these outages (CPU too and disk usage, if people want, but I think those won't matter)
Agreed. Can we cram
monit
or something similar on it?
-
@ben_lubar Whaaa? How were users notified of this down-time? I sure as fuck didn't see anything about it.
-
@anotherusername said in 502 Gateway Errors Are Getting OLD:
Or it's down for exactly 5 minutes.
If I get a desktop notification from ServerCooties, I usually restart the docker container right away. This morning I didn't because I saw @PJH was logged into the machine so I emailed him to see if he was debugging it.
-
I'm also disappointed in nodebb, and no I didn't test either.
As @accalia points out, we've just forgotten the horribleness of disco. I quit using TDWTF for the last few weeks of our time there, not out of protest, but it literally took me 10 minutes to post. It didn't work on mobile, there was so much jelly potato it was unable usable 98% of the time.
@ben_lubar i distinctly remember you having some monitoring set up (with graphs and everything). I think we would be calmed by some more information, both generally, and specifically when you kick the server over. Please?
-
@swayde said in 502 Gateway Errors Are Getting OLD:
i distinctly remember you having some monitoring set up
I got rid of that when I was trying to keep the server from falling over. I can re-add it, but not right now.
-
@ben_lubar fair enough. How about fancier hardware? The forum members have shown some willingness to donate/ pay if Alex doesn't want to. It'd be nice, even it it was only a bigger instance temporarily.
Edit: found the threadhttps://what.thedailywtf.com/topic/19550/can-we-throw-money-at-the-problem
Edit^2 actually relevant post
https://what.thedailywtf.com/topic/19550/can-we-throw-money-at-the-problem/34
-
@swayde I think we need to at least have some idea of where the problem lays before thinking about spending money on hardware. Does it really make sense to throw money at it without having any idea of what to spend it on, let alone if it will actually help the problem?
-
So are we not yet at the point that a chron job is setup to bounce the server daily?
AFAICT, there seems to have been a "recent" update that has made shit go side-ways. Do we have a time-stamp on when the cooties returned with a vengeance?
-
@Erufael yes, I think it does. If it doesn't work were that much more knowledgeable.
-
@MathNerdCNU you can get the raw Json from servercooties.com ( historical data,that is)
-
Discourse is shitty
NodeBB is shitty
phpbb is good
xenforo is goodUnpopular things that look nice are always traps.
-
@swayde said in 502 Gateway Errors Are Getting OLD:
@MathNerdCNU you can get the raw Json from servercooties.com ( historical data,that is
i can also pull the sqlite db if you want it. last i checked it was about 80 meg.
-
@swayde said in 502 Gateway Errors Are Getting OLD:
@MathNerdCNU you can get the raw Json from servercooties.com ( historical data,that is
i can also pull the sqlite db if you want it. last i checked it was about 80 meg.
-
@ben_lubar, have you spoken to @julianlam about getting to the bottom of why node is using 100% CPU?