Which badger shall we give the fox?


  • SockDev

    Continuing the discussion from 🎂The cupcake thread of celebrations.:

    @accalia said:

    i've done that enough with my bots i think i deserve a badge of shame.

    ... anyone agree enough to nominate me for one?

    Would this be an existing badger, or a new one? Ideas and suggestions welcome ;)

    For context, it's for breaking Discourse by having servercooties.com querying t/1000 regularly


  • SockDev

    as the one that this topic is about i feel i need to recuse myself from the discussion..

    ;-)


  • mod

    I'm not sure we have an existing badge that qualifies.


  • SockDev

    Suggestions for a new badger are welcome ;)



  • All of them.
    There isn't an existing badge for... whatever this is.


  • BINNED

    Forum murder?

    Can we share? :P



  • @Onyx said:

    Forum murder?

    Probably wise that isn't gamified :laughing:


  • SockDev

    The Piko Hammer Badge

    Because we hit Discourse until it broke :smile:



  • - Finders of site-breaking exploits, awarded at admins' discretion.

    And I guess "refreshing a page at an elevated, but still really low, rate" could theoretically be considered an exploit.


  • :belt_onion:

    should just award the entire forum that badge, simply having the forum open at all is aiding in ddos-ing the site. I seem to recall complaining about this back when Jeff decided to rate limit all of the things, which caused it to be impossible to open more than 5 tabs at a time or even scroll too fast without getting 429 errors for causing too much site traffic.

    Dischorse, the greatest forum-based ddos software of all time.



  • Can anyone explain why loading the first page of a 50000 page topic is harder for the forum than loading the first page of a 2 page topic?

    Surely the database query will return the same number of rows and need to binary search through the same size of index.


  • mod

    Because Discourse doesn't just return the first page of posts. It also returns all post IDs for the entire topic. Don't ask why, it just does.



  • See, on a forum with goddamn pages, it would be enough to just know the number of posts and the posts on the current page. Why does the client even care about post IDs? It doesn't expose them to the user, so the server shouldn't expose them to the client.


  • :belt_onion:

    @ben_lubar said:

    Can anyone explain why loading the first page of a 50000 page topic is harder for the forum than loading the first page of a 2 page topic?

    Because there are no pages.




  • :belt_onion:

    @ben_lubar said:

    Why does the client even care about post IDs?

    Not to mention that we've clearly shown that the message bus doesn't give a fuck about any of that shit because it breaks and can never recover itself....... speaking of which....

    why the hell does it need the list of every single post id in the topic when the message bus can't even manage to properly maintain the list after the topic has been loaded anyway???

    OHHHHHHHHHHHHHHHHH. Because ROOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOFL:
    scrolling through a topic doesn't result in it asking for "Give me x # of posts prior to this last post I have".... it actually SENDS THE LIST OF POST IDS BACK TO THE FUCKING SERVER.

    *http://what.thedailywtf.com/t/1000/posts.json?post_ids[]=391612&post_ids[]=391611&post_ids[]=391605&post_ids[]=391603&post_ids[]=391549&post_ids[]=391545&post_ids[]=391529&post_ids[]=391519&post_ids[]=391453&post_ids[]=391451&post_ids[]=391448&post_ids[]=391447&post_ids[]=391444&post_ids[]=391438&post_ids[]=391435&post_ids[]=391433&post_ids[]=391432&post_ids[]=391429&post_ids[]=391422&post_ids[]=391421&_=1429419968557

    whyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

    Well that explains why it can never recover once it fucks itself in the ass and misses a post, even if you scroll up until your post is out of sight. Just... why. What the.
    W... I........ don't even.

    How had none of us noticed (at least not in any topic I've followed) this retardedness before anyway?

    Seriously, someone give one good reason why the fuck that request isn't this (for scrolling up):
    *http://what.thedailywtf.com/t/1000/posts.json?post_id=391612&read_posts=-20&=1429419968557
    or (for scrolling down):
    *http://what.thedailywtf.com/t/1000/posts.json?post_id=391612&read_posts=20&
    =1429419968557

    Would that not be about 10 billion times easier to fucking maintain?

    I suppose it might make the query a bit harder because of having to sort and pull in order rather than by an id list? Except... oh god.......... when the client asks for the posts by id, the server responds with not only the posts in question, it also feeds back THE ENTIRE FUCKING 40,000 ID LIST AGAIN. WHEN THE CLIENT CLEARLY ALREADY KNEW THE FUCKING IDS BECAUSE IT JUST ASKED FOR ALL THE POSTS BY ID.................. so wait... if the client-side gets back all the ids again, why the fuck can't it recover from the messagebus new post retrieval fail.... i mean, it not only uselessly gets all the ids it already had, it then assumes it already had them all and throws them out anyway?



  • I could feel your head slowly imploding as I read that.

    10/8


  • SockDev

    Can we get back to badger discussion now? ;)


  • BINNED

    @darkmatter said:

    How had none of us noticed (at least not in any topic I've followed) this retardedness before anyway?

    Because we expected... sanity? Competence?

    You know, I feel sorry for suggesting the servercooties.com site. I feel sorry for not realizing this as well. I feel sorry for assisting in wasting all your people's time with constant fucking errors caused by attempting to measure performance by polling /t/1000.

    And then, you know, what, no. I am not sorry for some of it. Inconveniencing all of you? Yes. But I am not sorry for breaking Discourse. Because all we did, all we did is an equivalent of hitting F5 over and over again. Something people (including me) used to do on "toxic hellstew" forums in lively discussions because we didn't have notifications. I never saw one of them break due to that. Not a single one.

    Maybe I'm just making excuses for myself. But you know what? This is bullshit. I am at least partially right, no matter how you slice it. This shit needs to be fixed. Because if we did it here, we now pretty much know we can DDoS any Discourse instance by using what? 5-10 bots (not every forum is going to have /t/1000, we'd need to distribute it a bit) on a few longer topics? That is not a good design.


  • Discourse touched me in a no-no place

    @Onyx said:

    This is bullshit. I am at least partially right, no matter how you slice it. This shit needs to be fixed.

    Agreed. Sounds like their database is missing critical indices, they're passing the wrong info in messages, and maybe their ORM layer is somewhat off too (since that's ActiveRecord, it wouldn't surprise me at all).


  • SockDev

    I thought this category was for flag/badge discussion… :rolleyes:

    <hehe I don't really care!>


  • BINNED

    @RaceProUK said:

    I thought this category was for flag/badge discussion… :rolleyes:

    Sorry, the outrage is strong in this one.

    @RaceProUK said:

    <hehe I don't really care!>

    Why you little...



  • @darkmatter said:

    it actually SENDS THE LIST OF POST IDS BACK TO THE FUCKING SERVER.

    *http://what.thedailywtf.com/t/1000/posts.json?post_ids[]=391612&post_ids[]=391611&post_ids[]=391605&post_ids[]=391603&post_ids[]=391549&post_ids[]=391545&post_ids[]=391529&post_ids[]=391519&post_ids[]=391453&post_ids[]=391451&post_ids[]=391448&post_ids[]=391447&post_ids[]=391444&post_ids[]=391438&post_ids[]=391435&post_ids[]=391433&post_ids[]=391432&post_ids[]=391429&post_ids[]=391422&post_ids[]=391421&_=1429419968557

    whyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

    Community Server Tag Cloud emulation feature. Only dicksauce does it better: it's automated! You don't even need to add the tags by hand.



  • @dkf said:

    Sounds like their database is missing critical indices,

    table | index                                   | columns
    ======+=========================================+======================
    posts | idx_posts_created_at_topic_id           | created_at, topic_id
    posts | idx_posts_user_id_deleted_at            | user_id
    posts | index_posts_on_reply_to_post_number     | reply_to_post_number
    posts | index_posts_on_topic_id_and_post_number | topic_id, post_number
    posts | index_posts_on_user_id_and_created_at   | user_id, created_at
    posts | posts_pkey                              | id
    

    Hmm, Who can guess which index is missing? Anyone? Anyone?



  • :laughing: :facepalm:



  • @darkmatter said:

    it also feeds back THE ENTIRE FUCKING 40,000 ID LIST AGAIN. WHEN THE CLIENT CLEARLY ALREADY KNEW THE FUCKING IDS BECAUSE IT JUST ASKED FOR ALL THE POSTS BY ID..................

    They're RESTful! sometimesmaybe

    It's actually kinda useful for automation. For actual using of the forum, well... who'd want a topic with so many posts anyway? </jeff>


  • Grade A Premium Asshole

    @RaceProUK said:

    I thought this category was for flag/badge discussion…

    You should flag them as off topic.


  • Discourse touched me in a no-no place

    @TwelveBaud said:

    Hmm, Who can guess which index is missing? Anyone? Anyone?



  • I think I am I'm missing something here, and am too lazy to figure it out on my own:

    This list of all the posts in the topic -- are they sent back to the client, or is this something internal in the server?


  • BINNED

    The client, apparently.

    Also, I must now admit I don't see a critical index missing, not related to the current discussion at least. Post ID is there, so is topic ID + post number. I'm assuming they are not looking up topic ID + post ID. Or is that where I'm failing, since they are?



  • @Onyx said:

    Also, I must now admit I don't see a critical index missing, not related to the current discussion at least.

    Took me a while too.

    Think of T1000...



  • @Onyx said:

    The client, apparently.

    But... If the intention of the servercooties' poll is merely to check whether the site is up or down, why is it necessary to parse the javascript?

    bash-3.2$ curl http://what.thedailywtf.com/t/the-thread-of/1000/last --silent | wc -l
         594
    bash-3.2$ 
    

    Or is it important to make the server run through all the hoops to test it?


  • BINNED

    @Mikael_Svahnberg said:

    But... If the intention of the servercooties' poll is merely to check whether the site is up or down, why is it necessary to parse the javascript?

    It's not, on servercooties side.

    From what I gathered without subjecting myself to reading the source code, asking for /t/1000/latest will go on and grab all 40k post IDs from the DB, which then get filtered either in ORM, or in JS (I'm still hoping @riking was joking there).

    The end results is the last batch of posts, and that's all we were getting on servercooties. We didn't even mind what's in there, we just timed how long it took.

    Which is, apparently, around 2 seconds ATM. So if you have polling every few seconds from servercooties, and possibly other people's test instances as they were working on the site...



  • $ curl http://what.thedailywtf.com/t/the-thread-of/1000/last.json --silent | wc -c
    860625
    

    @Onyx said:

    filtered either in ORM, or in JS (I'm still hoping @riking was joking there).

    Nope, as has been demonstrated elsewhere, the JS actually uses all of the numbers. But it only really needs them all once and diffs afterwards.


  • I survived the hour long Uno hand

    $ time wget http://what.thedailywtf.com/t/the-thread-of/1000/last.json -q --output-document=/dev/null
    
    real    0m3.088s
    user    0m0.000s
    sys     0m0.016s
    

    It's up to 3 seconds now...



  • I'm still not sure I am getting it, but thats ok -- I have no desire to learn enough to start discodeveloping. I was just wondering whether you had to execute some javascript on the client side in order for The Big Query to be executed, and that sounded like a :wtf: in order to, essentially, ping the server.

    I will, for the sake of sloppy CCP's, leave a nicer time command below, just so we don't re-do the mistake of hitting /t/1000.

    bash-3.2$ time curl http://what.thedailywtf.com/t/2/last -IsL | grep Status | tail -1
    Status: 200 OK
    
    real	0m0.651s
    user	0m0.010s
    sys	0m0.009s
    bash-3.2$ 
    

  • BINNED

    That is, indeed, what servercooties.com is hitting now. So yeah, any further hitting of /t/1000 will be by the inconsiderate users themselves.


  • SockDev

    If only that damn hedgehog hadn't posted in t/1000 an instruction to stop posting in t/1000… seriously, she needs to think before posting…


  • I survived the hour long Uno hand

    @Mikael_Svahnberg said:

    leave a nicer time command below

    I was explicitly trying to demonstrate how long it takes to hit t/1000 now that nothing automatically hits it :)

    Also I originally was going to produce a super-nice breakdown using curl but the machine I grabbed not only doesn't have curl, but the package manager tells me I'm fucked and somehow have removed a dependency one of my installed packages needs which has to be resolved before I can install anything new, so fuck it, wget with time works.


  • BINNED

    @Yamikuronue said:

    the package manager tells me I'm fucked and somehow have removed a dependency one of my installed packages needs which has to be resolved before I can install anything new,

    apt-get -f install if Debian (derivative)?

    If it's something RPM-based, I have NFC :stuck_out_tongue:


  • I survived the hour long Uno hand

    @Onyx said:

    apt-get -f install

    Yes, that's what it told me to do.


  • :belt_onion:

    @Mikael_Svahnberg said:

    whether you had to execute some javascript on the client side in order for The Big Query to be executed

    no. and yes. that's the problem. simply trying to open t/1000 at all gets 40,0000 ids sent to the client. And then if you bother to interact with the site at all, gets you 40,0000 more ids every single time it tries to inline new posts. If you tried to read the entirety of t/1000, since it only loads 20 posts at a time, you'd get sent.... 80,000,000 post ids from server to client :trollface:

    @RaceProUK said:

    Can we get back to badger discussion now?

    I debated whether my post would catch enough attention to warrant making a new topic, but was too lazy to bother in the end, figuring that if it blows up enough then a mod can jeff it over to the meta or bugs category.



  • @darkmatter said:

    no. and yes. that's the problem. simply trying to open t/1000 at all gets 40,0000 ids sent to the client. And then if you bother to interact with the site at all, gets you 40,0000 more ids every single time it tries to inline new posts. If you tried to read the entirety of t/1000, since it only loads 20 posts at a time, you'd get sent.... 80,000,000 post ids from server to client

    But they are nowhere to be seen in the curl/wget queries posted upthread -- unless you hit last.json. That's what made me curious...


  • :belt_onion:

    It looks like that as long as you arent scrolling to new posts, the server won't resend the full topic id list. When it inlines new posts it doesn't load the full id list again. If you leave the topic and come back, it will load that full listing.

    All the bots/polling are turned off but we're still making the server cry for mercy by posting in t/1000, because I think sam said, posting to the topic causes it to query through the full topic's post listings too, which is contributing to the server pain.



  • First the hot spot was it loading all the undeleted post IDs for the topic every time there's a client-initiated request. We stopped the automated ones; now only people actually visiting and/or scrolling trigger it.

    Next the hot spot was counting ALL THE THINGS that go on a user profile page. We were hitting it via @Onyx's inline-postcount-and-badgecount userscript, being run by @aliceif and hitting some unexpected Ember silliness. We stopped that too.

    The next hot spot after that is whenever anyone posts to a topic, it has to recalculate all the statistics for it, specifically having issues tabulating the Top 5 posters. We can't fix that.

    The next hot spot ... will have to wait until after RailsConf to get discovered.


  • BINNED

    @TwelveBaud said:

    Next the hot spot was counting ALL THE THINGS that go on a user profile page. We were hitting it via @Onyx's inline-postcount-and-badgecount userscript

    Meaning I completely unintentionally broke (or assisted in breaking) the site twice now.

    Fucking hell. Can it be more fragile? Is it even possible?


  • :belt_onion:

    @TwelveBaud said:

    The next hot spot after that is whenever anyone posts to a topic, it has to recalculate all the statistics for it, specifically having issues tabulating the Top 5 posters. We can't fix that.

    Sure we can. We can stop posting to dischorse.



  • @TwelveBaud said:

    The next hot spot after that is whenever anyone posts to a topic, it has to recalculate all the statistics for it, specifically having issues tabulating the Top 5 posters. We can't fix that

    So.... Posting is a barrier to posting :question:


  • SockDev

    @darkmatter said:

    All the bots/polling are turned off but we're still making the server cry for mercy by posting in t/1000, because I think sam said, posting to the topic causes it to query through the full topic's post listings too, which is contributing to the server pain.

    Loads full post ids: http://what.thedailywtf.com/t/1000/posts.json?post_ids[]=

    Also (sadly) loads the full post ids.... http://what.thedailywtf.com/t/1000/posts.json?post_ids[]=12788&post_ids[]=12791&post_ids[]=12796

    i was hiping i couls say the bots only load the ids once per topic, but alas they do not. they load it once per 200 (currently) posts in the topic.


  • BINNED

    @accalia said:

    i was hiping i couls say the bots only load the ids once per topic, but alas they do not. they load it once per 200 (currently) posts in the topic.

    Does that mean the users load them once per 20 posts or not?

    Yes, I am utterly confused by this. I't not hard to figure out, I imagine. My brain just refuses to.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.