So... I am no longer blind about performance here ...
-
see:
Snippets from the report here (after running for about 40 minutes):
Total Requests: 62578 ( MessageBus: 31590 ) Top 30 users by Server Load Username Duration -------- -------- VinDuv 552.75 [Anonymous] 548.53 PaulaBean 137.83 HardwareGeek 50.38 aliceif 38.43 ... Top 30 routes by Server Load Route Duration ----- -------- post_actions/create 583.70 topics/show 514.38 user_avatars/show 91.74 topics/timings 86.24 list/latest 42.07 ... Top 30 urls by Server Load Url Duration --- -------- POST /post_actions HTTP/1.1 583.70 POST /topics/timings HTTP/1.1 91.66 GET /session HTTP/1.1 36.00 GET /t/2/last HTTP/1.1 28.06 GET / HTTP/1.1 26.30 POST /posts HTTP/1.1 25.75 GET /notifications HTTP/1.1 15.42 GET /t/category-definition-for-meta/2/99999999 HTTP/1.1 14.29 ... Top 30 not found urls (404s) Url Count --- ----- GET /session HTTP/1.1 2010 POST /notifications/reset-new HTTP/1.1 131 GET /rules.abe HTTP/1.1 8
What I learned from this...
-
Liking on t/1000 is brutal, each like is costing us 1-3 seconds of server time. I will look at tuning this some but really, this game is messing you up big time, @VinDuv's recent like spree cost us about 9 minutes of server time.
-
We are returning 404s sometimes inappropriately. Will clean that up.
-
t/2 is being accessed a lot
Will see what happens when I run the report again in say 10 or so hours. Will be very interesting to see the results cc @boomzilla @PJH
-
-
t/2 is being accessed a lot
http://isitjustmeorservercooti.es/
Liking on t/1000 is brutal, each like is costing us 1-3 seconds of server time. I will look at tuning this some but really, this game is messing you up big time, @VinDuv's recent like spree cost us about 9 minutes of server time.
Why does liking on t/1000 cost anything more than liking outside of t/1000? "Broken architecture" is the only thing I can think of.
-
2 things
- We broadcast the fact a post was like when the notification is created
- We count every god damn like on the topic so we can display "1.5M" on the front page
I can and will improve stuff here, but its VERY frustrating fixing bugs that will show up nowhere else.
Unlike the "we have a gigantic topic" which does show up elsewhere (and is slotted to be improved) .... "we have a giant topic and are maximising likes" is not something that has showed up anywhere else and I doubt it ever will.
-
fixing bugs that will show up nowhere else.
for now. Not to rankle you, I'm already happy at least somebody is paying at attention to the issues.
Yes /t/1000 is stretching it but what has been done inside the topic, specifically the amount of posts is not impossible for other fora to achieve. Maybe not in the < 1 year time frame but every other forum I've seen has at least 1 one long, long running topic.
-
The number of posts bug is definitely on roadmap to fix for next major ... you are not really winning here quite yet... for example
-
Now you are just implying we haven't been pushing hard enough.
-
To be honest, its a losing battle :)
https://community.muselive.com/?order=posts
Filed under: please dont click that topic
-
Paging @accalia @RaceProUK @Onyx
Shifting to /t/2 solves the 'kill discourse' problem but it seems the monitoring is still rather hard. It's monitoring. Not stress testing.
-
Another spike was just detected
https://what.thedailywtf.com/users/vault_dweller/activity/likes-given
-
Filed under: please dont click that topic
I did. It keeled over.
I can see how that should probably take priority over our likes shenanigans.
-
Shifting to /t/2 solves the 'kill discourse' problem but it seems the monitoring is still rather hard. It's monitoring. Not stress testing.
Since cooties are a rather low-traffic site, maybe it should only ping Discourse when the page is requested, optionally caching the last 5 seconds or so? We'd lose out on the nice graphs, but it should reduce the load.
-
Honestly ... the "fast pace" liking on t/1000 and a recent @paulabean bot rampaging are much more severe.
EG
/t/2?last = 43 seconds of work in an hour or so
@PaulaBean rampage = 258 seconds of work
@Vault_Dweller liking spree 359 seconds
@VinDuv liking spree 552 secondsso better focus on highest problems first ... server cooties is not killing anything here.
-
I guess the likes triggering badges & notifications also don't help.
-
If we didn't request t/2, how would we know whether threads load?
-
This is one of the reasons liking was slow
counting all the likes on the topic though is a huge PITA to do fast.
-
counting all the likes on the topic though is a huge PITA to do fast
SELECT COUNT(*) FROM Likes WHERE topic_id = 1000
?
-
SELECT SUM("posts"."like_count") AS sum_id FROM "posts" WHERE ("posts"."deleted_at" IS NULL) AND "posts"."topic_id" = 1000
Forced index scan on all the posts in the topic... there are 50k of them now. Index is already as good as it can be.
create inded idxTemp on posts(topic_id, like_count) where deleted_at is null
does not help.
I guess technically I could just add 1 and queue a proper refresh in 15 minutes, but I need to design a backend to support this (queue a job unless already queued)
I am going to have to fix this one cause the number of likes don't matter for this query. Its just number of rows.
-
-
SELECT COUNT(*) FROM Likes WHERE topic_id = 1000?
Postgres blows goats when it comes to counting things
-
Is there a reason you don't seem to have a corresponding topics.like_count ?
-
In other news:
@NetBot just chewed up 260 seconds.
Based on its profile, it only issued 41 Likes; that's about 6.4 seconds a Like.Methinks there's something else adding to the time there…
-
t/2 is being accessed a lot
that would be servercooties.com.
one of the monitoring endpoints is a topic load time, we needed a topic to load.
initially we were loading /t/1000 but that turned out to be a very bad idea indeed
-
https://commnity.museIive.cnm/t/count-to-a-million/53669/61550
This thread has gotten so big that it has taken me 4 attempts to even open this page. I kept on getting an error message
--
EDIT: link broken to prevent breaking someone else's forum - bz
-
That topic is actually really interesting because as far as I can tell (after trying to load the topic for a while), they don't have any likes in that topic.
Filed Under: They are counting "links", though.... 734 in that topic so far.
-
-
Their topic is litterally just people counting up.... I am not even sure WHY would you would link in such a topic.
Filed Under: But yeah, sure, we have the bigger link-topic, I guess. Does that inflate our e-peen by any significant amount?
-
I think it's actually outgoing links
-
Maybe the posters linked to relevant things?
The 2048 game comes to mind ...
-
I pressed Pos1 on the topic ~20 minutes ago to find out where the links go to.
FIled Under: :faspin:
-
/t/2?last = 43 seconds of work in an hour or so@PaulaBean rampage = 258 seconds of work@Vault_Dweller liking spree 359 seconds @VinDuv liking spree 552 seconds
so better focus on highest problems first ... server cooties is not killing anything here.
Yeah! Let's ban @VinDuv!
-
Top 30 users by Server Load Username Duration -------- -------- [Anonymous] 3241.51 PaulaBean 766.18 NetBot 269.35 RaceProUK 239.82 boomzilla 197.19 obeselymorbid 150.00 Luhmann 146.73 accalia 138.75 Maciejasjmj 123.92 Zecc 123.32 Kuro 101.24 flabdablet 92.46 TwelveBaud 83.78 Jaloopa 76.76 Dlareg 72.56 Vault_Dweller 70.86 Scarlet_Manuka 68.97 xaos 68.86 Boner 65.53 cartman82 65.12 sam 64.93
@apapadimoulis what is Paula up to?
-
Since cooties are a rather low-traffic site, maybe it should only ping Discourse when the page is requested, optionally caching the last 5 seconds or so? We'd lose out on the nice graphs, but it should reduce the load.
You'd also have to define what you mean by "traffic". It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.
That said, while I'm open to possibility of not hammering the server with requests, DoSing the site by refreshing a single topic should not be something that should be doable. It's something I used to do on forums in an active thread back in the day when "Refresh every x seconds" was a feature I had built-in into my browser. Ok, I didn't set it to 5 seconds (because there was no real benefit in doing that, really, 30 was still good enough), but still...
@sam, do you have any solid data on likes themselves using server time vs. loading batches being the culprit? We know all IDs are loaded on each batch load (presumably to keep the scrollbar thing synched?) for one. Also, postgres doesn't really like
OFFSET
, how is Active Record doing that? The usual advice on doing offsets in large tablees in postgres is creating a function that takes the offset and calculates the IDs that need to be fetched, and then there's an index on the function rather than just the column.
-
We have an immune system for this but its disabled seeing your are all allowed 750 likes a day here :)
-
It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.
So that you know when you can't use the site even when you don't intend to use the site!
...wait.
-
@apapadimoulis what is Paula up to?
There was a recent article that was all-caps, I think? Maybe it's still trying to create a topic for that?
-
Dammit ... I just didn't make top 5
-
do you have any solid data on likes themselves using server time vs. loading batches being the culprit?
I sure do... the route being hit is
post_actions/create
so that is how I was able to isolate it, nobody has been catching up on likes in the last 3 or so hours so its low.Regardless next beta makes "liking" a lot faster cause of the new index, which is awesome and will help everywhere, even on short topics. Deferring the sum query will make it same as liking any other topic (going to add that as well).
-
@NetBot third, myself fourth… not great
I'll see if I can reduce that load, but I can't make any promises
-
Very curious to see the report on 24 hours worth of data ... will post more numbers tomorrow morning
-
I'm 6th and 100% pure human
-
Very curious to see the report on 24 hours worth of data ... will post more numbers tomorrow morning
In that case, I'll wait, see what those figures are first; don't want to spoil the data ;)
@Luhmann said:I'm 6th and 100% pure human
And there are two other 100% humans between me and you; I'm not overly worried about the load I'm creating, but if I can reduce it, then I think it's only fair I do so
-
boomzilla 197.19
This is likely strongly related to this:
@apapadimoulis what is Paula up to?
I know that there was a lot of testing going on yesterday trying to get her to correctly link front page stuff to topics here, plus allowing the possibility of manual edits. I wouldn't have thought that stuff would have caused that much server load, though.
-
was just about to test deleting a post in a huge topic and I pretty much ran out of day.
yeah ... deleting from large topics is brutal... will fix
-
accalia 138.75
hmm not surprising here.
what's curious is why is @NetBot the only true bot on the top users list? @RaceProUK and I both run "cyberparts" to play the /t/1E3 game, but given that Zoidberg and SockBot are also active and play that game too it would seem that the bots are not causing much server load as they do not appear in the top 20 (30? the list says 30 but only has 21 rows)
what period is that data from? are we talking most of a day? more than a day? or just an hour at most?
-
what's curious is why is @NetBot the only true bot on the top users list?
Based on the timings of Like binges and the lack of @RPBot, I'd say that it's data from a few hours at most; that period happened to include @NetBot's binge but not @RPBot's, which was four hours earlier
-
That said, while I'm open to possibility of not hammering the server with requests,
i've certainly worked hard to keep ServerCooties.com at a reasonable polling rate. Currently we poll each endpoint with at least 15 seconds delay before the endpoint is polled again. and the requests are set up that the nest poll is not queued until the previous one clears. Given that I as a human user am more than capable of producing requests at a rate far exceeding that i thin we're reasonably good there.
DoSing the site by refreshing a single topic should not be something that should be doable.
we thought so one time.... then we started polling /t/1E3.... that did not end well. that's why we're pollint /t/2 now.It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.
yep. i have that too. it's nice to get the notifications. the notifications for global notice when @PJH is announcing an impending upgrade are also nice.
-
we thought so one time.... then we started polling /t/1E3.... that did not end well. that's why we're pollint /t/2 now.
It is doable, but it really shouldn't be
-
Based on the timings of Like binges and the lack of @RPBot,
hmm all those bots are caught up now though right? so the like binge basically amounts to a topic scan looking for unliked posts to like.... most of the posts will never get liked.
I should make version 2.0 have some persistent storage so it knows the posts it's already liked and skips asking for their JSON entirely.
-
-
Look. You dumbasses with your stupid gamification bullshit are ruining the experience for me and the rest of the normal users who don't base our sense of self-worth on a number from a website.