So... I am no longer blind about performance here ...


  • Banned

    see:

    Snippets from the report here (after running for about 40 minutes):

    
    Total Requests: 62578 ( MessageBus: 31590 )
    
    Top 30 users by Server Load
    
    Username          Duration
    --------          --------
    VinDuv              552.75
    [Anonymous]         548.53
    PaulaBean           137.83
    HardwareGeek         50.38
    aliceif              38.43
    ...
    
    
    Top 30 routes by Server Load
    
    Route                    Duration
    -----                    --------
    post_actions/create        583.70
    topics/show                514.38
    user_avatars/show           91.74
    topics/timings              86.24
    list/latest                 42.07
    ...
    
    
    Top 30 urls by Server Load
    
    Url                                                                                                                                        Duration
    ---                                                                                                                                        --------
    POST /post_actions HTTP/1.1                                                                                                                  583.70
    POST /topics/timings HTTP/1.1                                                                                                                 91.66
    GET /session HTTP/1.1                                                                                                                         36.00
    GET /t/2/last HTTP/1.1                                                                                                                        28.06
    GET / HTTP/1.1                                                                                                                                26.30
    POST /posts HTTP/1.1                                                                                                                          25.75
    GET /notifications HTTP/1.1                                                                                                                   15.42
    GET /t/category-definition-for-meta/2/99999999 HTTP/1.1                                                                                       14.29
    ...
    
    Top 30 not found urls (404s)
    
    Url                                                                                                                                              Count
    ---                                                                                                                                              -----
    GET /session HTTP/1.1                                                                                                                             2010
    POST /notifications/reset-new HTTP/1.1                                                                                                             131
    GET /rules.abe HTTP/1.1                                                                                                                              8
    
    

    What I learned from this...

    1. Liking on t/1000 is brutal, each like is costing us 1-3 seconds of server time. I will look at tuning this some but really, this game is messing you up big time, @VinDuv's recent like spree cost us about 9 minutes of server time.

    2. We are returning 404s sometimes inappropriately. Will clean that up.

    3. t/2 is being accessed a lot

    Will see what happens when I run the report again in say 10 or so hours. Will be very interesting to see the results cc @boomzilla @PJH



  • @sam said:

    t/2 is being accessed a lot

    http://isitjustmeorservercooti.es/

    @sam said:

    Liking on t/1000 is brutal, each like is costing us 1-3 seconds of server time. I will look at tuning this some but really, this game is messing you up big time, @VinDuv's recent like spree cost us about 9 minutes of server time.

    Why does liking on t/1000 cost anything more than liking outside of t/1000? "Broken architecture" is the only thing I can think of.


  • Banned

    2 things

    1. We broadcast the fact a post was like when the notification is created
    2. We count every god damn like on the topic so we can display "1.5M" on the front page

    I can and will improve stuff here, but its VERY frustrating fixing bugs that will show up nowhere else.

    Unlike the "we have a gigantic topic" which does show up elsewhere (and is slotted to be improved) .... "we have a giant topic and are maximising likes" is not something that has showed up anywhere else and I doubt it ever will.



  • @sam said:

    fixing bugs that will show up nowhere else.

    for now. Not to rankle you, I'm already happy at least somebody is paying at attention to the issues.
    Yes /t/1000 is stretching it but what has been done inside the topic, specifically the amount of posts is not impossible for other fora to achieve. Maybe not in the < 1 year time frame but every other forum I've seen has at least 1 one long, long running topic.


  • Banned

    The number of posts bug is definitely on roadmap to fix for next major ... you are not really winning here quite yet... for example

    http://new.vr-hell.com/?order=posts



  • Now you are just implying we haven't been pushing hard enough.

    :trollface:


  • Banned

    To be honest, its a losing battle :)

    https://community.muselive.com/?order=posts

    Filed under: please dont click that topic



  • Paging @accalia @RaceProUK @Onyx

    Shifting to /t/2 solves the 'kill discourse' problem but it seems the monitoring is still rather hard. It's monitoring. Not stress testing.


  • Banned



  • @sam said:

    Filed under: please dont click that topic

    I did. It keeled over.

    I can see how that should probably take priority over our likes shenanigans.



  • @Luhmann said:

    Shifting to /t/2 solves the 'kill discourse' problem but it seems the monitoring is still rather hard. It's monitoring. Not stress testing.

    Since cooties are a rather low-traffic site, maybe it should only ping Discourse when the page is requested, optionally caching the last 5 seconds or so? We'd lose out on the nice graphs, but it should reduce the load.


  • Banned

    Honestly ... the "fast pace" liking on t/1000 and a recent @paulabean bot rampaging are much more severe.

    EG

    /t/2?last = 43 seconds of work in an hour or so
    @PaulaBean rampage = 258 seconds of work
    @Vault_Dweller liking spree 359 seconds
    @VinDuv liking spree 552 seconds

    so better focus on highest problems first ... server cooties is not killing anything here.



  • I guess the likes triggering badges & notifications also don't help.


  • sockdevs

    If we didn't request t/2, how would we know whether threads load? :stuck_out_tongue:


  • Banned

    This is one of the reasons liking was slow

    counting all the likes on the topic though is a huge PITA to do fast.


  • BINNED

    @sam said:

    counting all the likes on the topic though is a huge PITA to do fast

    SELECT COUNT(*) FROM Likes WHERE topic_id = 1000?


  • Banned

    SELECT SUM("posts"."like_count") AS sum_id FROM "posts"  WHERE ("posts"."deleted_at" IS NULL) AND "posts"."topic_id" = 1000  
    

    Forced index scan on all the posts in the topic... there are 50k of them now. Index is already as good as it can be.

    create inded idxTemp on posts(topic_id, like_count) where deleted_at is null 
    

    does not help.

    I guess technically I could just add 1 and queue a proper refresh in 15 minutes, but I need to design a backend to support this (queue a job unless already queued)

    I am going to have to fix this one cause the number of likes don't matter for this query. Its just number of rows.


  • Banned

    In other news:

    NetBot           260.71
    

    @NetBot just chewed up 260 seconds.



  • @Jaloopa said:

    SELECT COUNT(*) FROM Likes WHERE topic_id = 1000?

    Postgres blows goats when it comes to counting things



  • Is there a reason you don't seem to have a corresponding topics.like_count ?


  • sockdevs

    @sam said:

    In other news:
    @NetBot just chewed up 260 seconds.

    Based on its profile, it only issued 41 Likes; that's about 6.4 seconds a Like.

    Methinks there's something else adding to the time there…


  • sockdevs

    @sam said:

    t/2 is being accessed a lot

    that would be servercooties.com.

    one of the monitoring endpoints is a topic load time, we needed a topic to load.

    initially we were loading /t/1000 but that turned out to be a very bad idea indeed


  • area_deu

    https://commnity.museIive.cnm/t/count-to-a-million/53669/61550

    This thread has gotten so big that it has taken me 4 attempts to even open this page. I kept on getting an error message :stuck_out_tongue:

    --

    EDIT: link broken to prevent breaking someone else's forum - bz


  • Winner of the 2016 Presidential Election

    That topic is actually really interesting because as far as I can tell (after trying to load the topic for a while), they don't have any likes in that topic.

    Filed Under: They are counting "links", though.... 734 in that topic so far.


  • sockdevs

    @Kuro said:

    734 in that topic so far

    Pfft.


  • Winner of the 2016 Presidential Election

    Their topic is litterally just people counting up.... I am not even sure WHY would you would link in such a topic.

    Filed Under: But yeah, sure, we have the bigger link-topic, I guess. Does that inflate our e-peen by any significant amount?


  • sockdevs

    I think it's actually outgoing links


  • area_deu

    Maybe the posters linked to relevant things?
    The 2048 game comes to mind ...


  • Winner of the 2016 Presidential Election

    I pressed Pos1 on the topic ~20 minutes ago to find out where the links go to.

    FIled Under: :faspin:



  • @sam said:

    /t/2?last = 43 seconds of work in an hour or so@PaulaBean rampage = 258 seconds of work@Vault_Dweller liking spree 359 seconds @VinDuv liking spree 552 seconds

    so better focus on highest problems first ... server cooties is not killing anything here.

    Yeah! Let's ban @VinDuv!


  • Banned

    Top 30 users by Server Load
    
    Username       Duration
    --------       --------
    [Anonymous]     3241.51
    PaulaBean        766.18
    NetBot           269.35
    RaceProUK        239.82
    boomzilla        197.19
    obeselymorbid    150.00
    Luhmann          146.73
    accalia          138.75
    Maciejasjmj      123.92
    Zecc             123.32
    Kuro             101.24
    flabdablet        92.46
    TwelveBaud        83.78
    Jaloopa           76.76
    Dlareg            72.56
    Vault_Dweller     70.86
    Scarlet_Manuka    68.97
    xaos              68.86
    Boner             65.53
    cartman82         65.12
    sam               64.93
    
    

    @apapadimoulis what is Paula up to?


  • :belt_onion:

    @Maciejasjmj said:

    Since cooties are a rather low-traffic site, maybe it should only ping Discourse when the page is requested, optionally caching the last 5 seconds or so? We'd lose out on the nice graphs, but it should reduce the load.

    You'd also have to define what you mean by "traffic". It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.

    That said, while I'm open to possibility of not hammering the server with requests, DoSing the site by refreshing a single topic should not be something that should be doable. It's something I used to do on forums in an active thread back in the day when "Refresh every x seconds" was a feature I had built-in into my browser. Ok, I didn't set it to 5 seconds (because there was no real benefit in doing that, really, 30 was still good enough), but still...

    @sam, do you have any solid data on likes themselves using server time vs. loading batches being the culprit? We know all IDs are loaded on each batch load (presumably to keep the scrollbar thing synched?) for one. Also, postgres doesn't really like OFFSET, how is Active Record doing that? The usual advice on doing offsets in large tablees in postgres is creating a function that takes the offset and calculates the IDs that need to be fetched, and then there's an index on the function rather than just the column.


  • Banned

    We have an immune system for this but its disabled seeing your are all allowed 750 likes a day here :)



  • @Onyx said:

    It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.

    So that you know when you can't use the site even when you don't intend to use the site!

    ...wait.


  • :belt_onion:

    @sam said:

    @apapadimoulis what is Paula up to?

    There was a recent article that was all-caps, I think? Maybe it's still trying to create a topic for that?



  • Dammit ... I just didn't make top 5


  • Banned

    @Onyx said:

    do you have any solid data on likes themselves using server time vs. loading batches being the culprit?

    I sure do... the route being hit is post_actions/create so that is how I was able to isolate it, nobody has been catching up on likes in the last 3 or so hours so its low.

    Regardless next beta makes "liking" a lot faster cause of the new index, which is awesome and will help everywhere, even on short topics. Deferring the sum query will make it same as liking any other topic (going to add that as well).


  • sockdevs

    @NetBot third, myself fourth… not great :laughing:

    I'll see if I can reduce that load, but I can't make any promises


  • Banned

    Very curious to see the report on 24 hours worth of data ... will post more numbers tomorrow morning



  • I'm 6th and 100% pure human


  • sockdevs

    @sam said:

    Very curious to see the report on 24 hours worth of data ... will post more numbers tomorrow morning

    In that case, I'll wait, see what those figures are first; don't want to spoil the data ;)
    @Luhmann said:
    I'm 6th and 100% pure human

    And there are two other 100% humans between me and you; I'm not overly worried about the load I'm creating, but if I can reduce it, then I think it's only fair I do so



  • @sam said:

    boomzilla 197.19

    This is likely strongly related to this:

    @sam said:

    @apapadimoulis what is Paula up to?

    I know that there was a lot of testing going on yesterday trying to get her to correctly link front page stuff to topics here, plus allowing the possibility of manual edits. I wouldn't have thought that stuff would have caused that much server load, though.


  • Banned

    was just about to test deleting a post in a huge topic and I pretty much ran out of day.

    yeah ... deleting from large topics is brutal... will fix


  • sockdevs

    @sam said:

    accalia 138.75

    hmm not surprising here.

    what's curious is why is @NetBot the only true bot on the top users list? @RaceProUK and I both run "cyberparts" to play the /t/1E3 game, but given that Zoidberg and SockBot are also active and play that game too it would seem that the bots are not causing much server load as they do not appear in the top 20 (30? the list says 30 but only has 21 rows)

    what period is that data from? are we talking most of a day? more than a day? or just an hour at most?


  • sockdevs

    @accalia said:

    what's curious is why is @NetBot the only true bot on the top users list?

    Based on the timings of Like binges and the lack of @RPBot, I'd say that it's data from a few hours at most; that period happened to include @NetBot's binge but not @RPBot's, which was four hours earlier


  • sockdevs

    @Onyx said:

    That said, while I'm open to possibility of not hammering the server with requests,

    i've certainly worked hard to keep ServerCooties.com at a reasonable polling rate. Currently we poll each endpoint with at least 15 seconds delay before the endpoint is polled again. and the requests are set up that the nest poll is not queued until the previous one clears. Given that I as a human user am more than capable of producing requests at a rate far exceeding that i thin we're reasonably good there.

    @Onyx said:

    DoSing the site by refreshing a single topic should not be something that should be doable.
    we thought so one time.... then we started polling /t/1E3.... that did not end well. that's why we're pollint /t/2 now.

    @Onyx said:

    It's not an uncommon thing for me to have it in a background tab so I can get desktop notifications from it.
    yep. i have that too. it's nice to get the notifications. the notifications for global notice when @PJH is announcing an impending upgrade are also nice.


  • BINNED

    @accalia said:

    we thought so one time.... then we started polling /t/1E3.... that did not end well. that's why we're pollint /t/2 now.

    It is doable, but it really shouldn't be


  • sockdevs

    @RaceProUK said:

    Based on the timings of Like binges and the lack of @RPBot,

    hmm all those bots are caught up now though right? so the like binge basically amounts to a topic scan looking for unliked posts to like.... most of the posts will never get liked.

    I should make version 2.0 have some persistent storage so it knows the posts it's already liked and skips asking for their JSON entirely.


  • sockdevs

    @Jaloopa said:

    It is doable, but it really shouldn't be

    that was rather my point.



  • Look. You dumbasses with your stupid gamification bullshit are ruining the experience for me and the rest of the normal users who don't base our sense of self-worth on a number from a website.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.