Moar Cooties
-
@darkmatter And I had a 3rd window open in that topic that I was using to actually read the replies.
-
@darkmatter I think it was your script I used to catch up myself; your script is well-throttled (averaging, what, one or two requests a second), so that won't have caused an issue.
-
@RaceProUK yea - I toned it down when I posted it, so it would only
load a few posts at onceload a new batch after a few scrolls each time, giving the server some time to deal with each blast of likes.When I used it myself, I had it much more aggressively jumping entire pages at once, so every 3 seconds it was basically loading a full new set of posts to like.
tbh, I was surprised at how well nodeBB was handling it. I monitored the network traffic the whole time to see if it was starting to struggle at all, but the responses were pretty quick.
-
@darkmatter said in Moar Cooties:
@boomzilla seems like that alone can't be the problem - I "liked" around 3000 posts in 2 browser windows in a span of 10 - 15 minutes the other night and it wasn't a problem for the site at all.
Maybe not your script. But the fact that severe cooties started within seconds of moving it back out, though, something was pounding that topic hard.
-
@boomzilla said in Moar Cooties:
severe cooties started within seconds of moving it back out
I suppose if someone had a script going right now, it might have instantly caught on and gone to town with it.
Wonder if someone was trying to go through and downvote everything starting from the top... that might hurt?
-
@darkmatter I checked the OP and the first few posts. No downvoters.
-
Hey @ben_lubar is there any way to see a log of voting activity?
Though I kind of suspect it was submitting redundant events, which probably wouldn't end up in the activity (assuming there's even a way to reconstruct that).
-
Who got notificationspammed during the outage? Anyone? Speak up! >.>
ETA: @DogsB said he was doing something in the downvote thread...
-
@Yamikuronue I didn't get anything except notifications for recent stuff, as you'd expect.
-
@boomzilla said in Moar Cooties:
submitting redundant events
If whomever's script for upvoting isn't retarded, there's no redundancy. Once the upvote button is clicked, the script won't trigger another click on the same button because its component name changes.
-
@Yamikuronue It was over 6 hours ago since I got any upvotes in likes topic, according to notifications.
-
I had it running as I mentioned over there. Stopped it a good hour before the cooties storm, so unless likes were in a queue somewhere...
Edit: note, ran it from post 1 just in case, reached about 7ish k,I doubt I even gave out any likes in that range.
-
did we just pull likes topic back out? Because the site is pooping itself again.
-
@darkmatter also, the forum should notify you when you get downvoted. even if it doesn't tell you who did it. People need to know when they say something stupid damnit. Too much positive reinforcement might make them think their ideas are ok.
-
@darkmatter you're afraid we're going to produce more Jeffs?
-
This graph is sub-optimal:
-
@boomzilla said in Moar Cooties:
Maybe not your script. But the fact that severe cooties started within seconds of moving it back out, though, something was pounding that topic hard.
Some crawler?
-
@PleegWat Bet it's Jeff trying to prove that Big Topics™ are Doing It Wrong™.
-
and the cooties return despite the lack of likes.
-
@darkmatter Notifications have now been disabled to see if that helps; from the discussion over on IRC, they did seem to be the culprit
-
@RaceProUK notifications only seem to be partly neutered. it keeps toastering me when someone responds. the only thing not showing notifications for me is the notification icon itself :)
-
@darkmatter Yeah, it's a bit of a half-assed job. Still, it seems the bit that's disabled is the bit that was causing the issues, so there's that. Plus the 'Mark all as read' still works.
-
@RaceProUK said in Moar Cooties:
Still, it seems the bit that's disabled is the bit that was causing the issues, so there's that.
My experience this morning says otherwise.
-
It's a bit fucked.
-
@loopback0
Just a little
-
-
@Onyx
I think Youtube's ad is mocking us.
-
-
In every part of IT except modern web development, performance tuning ≠ turning shit off randomly in the vain hope it will make things better
-
@tufty Ben didn't randomly turn something off.
-
And yet the cooties are still here.
In the business world, one is occasionally required to face up to the fact that continuing to throw time and money at something fundamentally broken in the hope it will eventually work acceptably, especially when a functional alternative exists, is not a viable proposition. In the open source world, of course, there's no money involved and it's other people's time we're wasting, so who cares, amirite?
We just got (thrown) off Discourse's 3-triangular-wheeled wagon, why didn't we learn our lesson there instead of jumping straight onto a wagon equipped with 2 square wheels and a pogo stick?
-
@tufty something changed in the past couple if days resulting in these cooties. Prior to that, performance seemed to be far better than discourse. We are probably one of the largest groups using nodebb at the moment, and are bound to find a few edge cases. Give people in a position to fix things some time, and things should improve.
-
@tufty We do seem to have migrated from a forum that was borderline completely unusable due to performance issues, to a forum that is borderline completely unusable due to performance issues. I haven't been able to do anything on the forum for going on a day now.
-
@Nocha said in Moar Cooties:
@tufty something changed in the past couple if days resulting in these cooties.
Things don't randomly change. People change things, usually to fix bugs or add features.
How that's done is in one of two ways.
- Developer makes changes. Then they test those changes. Then a number of changes are packaged up as a release. A separate test team then test that. Once that's all passed, the release is pushed out to production.
- Developer makes changes. Testing? Hah! Changes are committed to the repository. Production (who are pulling the bleeding edge of git direct to their servers) pick up bugs, performance issues and regressions.
We've seen this before.
Prior to that, performance seemed to be far better than discourse.
It did indeed. Usability was on a par or worse, however.
We are probably one of the largest groups using nodebb at the moment, and are bound to find a few edge cases.
- Being one of the largest users of X is never a good thing unless you're really big, a valuable showcase, or paying big money.
- These are not "edge cases"
- That's what we said about Discourse.
Give people in a position to fix things some time, and things should improve.
-
-
Do we know if the storm affected all 4 NodeBB instances, or just specific ones? If it affected all 4 instances simultaneously, that would tend to suggest a single point of failure, like Mongo db.
If the slowdown was in the NodeBB backend, I'd expect individual instances to suffer while other instances are still OK, especially for a DOS attack.
Do we have any way of profiling the database and the node instances to see what's going on?
It'd be useful to profile the hosts to see if CPU, Memory, Disk access or Network bandwidth is being saturated.
Edit: I've commented on Ticket 66 to ask the above questions.
-
A colleague of mine has suggested that we might want to consider using http://pm2.keymetrics.io/. It's open source and free to use.
It has a built in cluster and load balancing mode http://pm2.keymetrics.io/docs/usage/cluster-mode/
It has inbuilt monitoring http://pm2.keymetrics.io/docs/usage/monitoring/
and a modules system, so we can monitor external modules like MongoDB http://pm2.keymetrics.io/docs/advanced/pm2-module-system/
It might help us diagnose what's causing the cooties.
-
@tufty Insanity is posting that quote to a bunch of people and expecting things to change; you can't beat idiocy.
-
@tufty said in Moar Cooties:
turning shit off randomly
Point of clarification: he turned off a single route that was known to cause the cooties; the fact there have been more since implies there's another cause.
@Polygeekery said in Moar Cooties:
We do seem to have migrated from a forum that was borderline completely unusable due to performance issues, to a forum that is borderline completely unusable due to performance issues.
That's what happens when you try to make software too clever; it just shits itself. Keep things simple and dumb, and they're more likely to work properly.
-
@JBert said in Moar Cooties:
@tufty Insanity is posting that quote to a bunch of people and expecting things to change; you can't beat idiocy.
Tru dat.
@boomzilla said in Moar Cooties:@tufty Ben didn't randomly turn something off.
Comments in the ticket thread indicating he'd turned them off due to a misreading of a bug report on the nodeBB bug tracker, along with
seem to have stopped
and your own
Search may also be a factor
indicate, to me at least, wild stabs in the dark.
-
@tufty said in Moar Cooties:
indicate, to me at least, wild stabs in the dark.
I guess with incomplete information that makes sense. When he turned off that bit of notifications things improved dramatically for quite a while. It was certainly a major cause.
-
@tufty said in Moar Cooties:
Comments in the ticket thread indicating he'd turned them off due to a misreading of a bug report on the nodeBB bug tracker
The 1,000-second response times on the endpoint might have had something to do with it too.
-
@RaceProUK said in Moar Cooties:
Point of clarification: he turned off a single route that was known to cause the cooties; the fact there have been more since implies there's another cause.
My reading of it is that he "bodged" off a route that was known to cause problems, whilst simultaneously applying the fix to that bug. It may be that the fix itself caused other issues. It may be that it didn't, in fact, fix the bug. We'll probably never know because there's zero fucking test coverage, nothing put in place to detect regressions; in short, the kind of development process this site was intended to mock, not use.
I can't be fucked to find the actual commit @ben_lubar updated to, but here's an example:
This is a commit with the title
fixes crash NodeBB/nodebb-theme-persona#250
and here's the bug it purportedly fixes
You'll note that the commit changes one file. No tests. Fucking amateur hour, people.
-
@tufty said in Moar Cooties:
No tests
No evidence of tests.
Absence of evidence is not evidence of absence.
-
@RaceProUK said in Moar Cooties:
@tufty said in Moar Cooties:
No tests
No evidence of tests.
Absence of evidence is not evidence of absence.
Absence of tests committed with a bug fix is absence of proof you've tested.
see also
git commit --no-tests-i-am-a-fucking-moron-sack-me
-
@tufty said in Moar Cooties:
Absence of tests committed with a bug fix is absence of proof you've tested.
- Fixing a bug doesn't always mean a test has changed
- There are forms of testing that don't involve code
-
@RaceProUK said in Moar Cooties:
No evidence of tests.
I got an email from a Jenkins server somewhere for a failed NodeBB build. No clue who it belongs to or anything or what the failure was. It seemed to have sent the email to github accounts with recent commits.
There is a
tests
dir in the NodeBB repo. I've not looked at it in depth.
-
@tufty it's complicated to test performance without users, load tests are hard
And tdwtf isn't a forum developer, we just have admins trying to help, you can't ask them to ITIL all the things or whatever.
TDWTF is it's owner hobby iirc, don't expect him to get all enterprizey on his free awesome forum.
-
@boomzilla I saw those tests; I was looking to see what the coverage was, but they don't report that. Anyway, the commit @tufty linked to wasn't in the core repo, it was in the Persona theme repo; haven't looked for tests there yet.
-
@fbmac said in Moar Cooties:
it's complicated to test performance without users, load tests are hard
It's not just load tests that are hard, it's all integration testing (as opposed to unit testing), especially for stuff that has a UI aspect. The fact it's hard, however, is no excuse for simply not doing it. Toolkits for doing this exist, are relatively easy to use, and are free.
Back when I was doing web dev stuff, I had a (roughly) 4:1 ratio of tests to code in terms of number of functions. Unit tests were around 1:1, and the other 3:1 was integration / functional testing. And then around another 2:1 volumetrically in terms of fixtures. Those ratios were more or less solid for WebObjects, Java stuff, and Rails.
Go on. Delete your post. You know you want to. I can feel your finger hovering over the "delete" button right now.
@boomzilla said in Moar Cooties:
There is a tests dir in the NodeBB repo. I've not looked at it in depth.
Tests for "topics" runs to 195 lines of code, and appears to test (a massive) 13 separate use cases. I would suggest that "making, editing, viewing and deleting topics", being a forum's main "raison d'etre", might warrant a few more tests than that.