Please ease up with the bots
-
The NGINX logs are flooded with bot activity. I just engaged some rate limiting to stop some of the bleeding and keep the site running a bit better.
That said, some bots (SockAdept 1.0.0) are flooding the site at DoS rates, if you are writing a bot, please, have some etiquette about the number of reqs you make. 10 a minute as opposed to 10 a second would be a good start.
-
All my instances of sockbot (uses SockAdept as user agent) have been killed. I'll look through the code and make sure they are tuned way back before i turn them back on.
i know others are using Sockbot base code as well. I'll bump the version number in the user agent so we can tell those apart and filter/kill as appropriate.
-
Both of mine are supposed to poll every 10 seconds, need to see if that's actually getting honoured.
-
Ok, that should have made sockbot much less of a resource hog. increased polling delay 20x (should poll about once every 10 seconds now +/- processing time of notifications(and additional 5 seconds if it attempts a post as a result of the notification)).
I gave the two instances of sockbot different user agents so they can be told apart if they cause more problems. (
Sockadept 1.1.0
andSockAdept 1.2.0
, if you are getting any moreSockAdept 1.0.0
that's not an instance run by me.)Should sockbot continue to cause problems please send me a PM or email (i'm in the database); my intention with sockbot is to have fun, not to DoS the site.
-
They're dead for now - can't even get them to login without hitting a 429 error despite it only being the second request and 10 seconds after the first request.
-
hmm... an excellent point. I'll be back. I'mma going to bake into sockbot that a 429 response shall result in a delay of 60 seconds before processing continues.
not sure why he's not getting 429s and you are though. i think @sam would have to answer that one?
-
I'm leaving them dead for now, in case it's x requests over y time and maybe their previous activity puts them over the limit for now.
I tried adding a 30 second wait after a 429 before a retry but it just gets in a loop.
-
hmm.... well it's not useragent based. I tested switching sockbot back to using
SockAdept 1.0.0
for a useragent. no 429s. he's back to 1.1.0 where he should be now.also delay of 60 seconds after 429 is now enforced. should be out to Git later tonight.
-
I've swapped discoursebot back to a static message too, as calculating the last bug logged adds more requests.
Might just swap to your SockBot code (but change the useragent) as looking at it I've got a better idea of what it's doing than the current code, which is making dealing with the 429 more difficult.
-
you are welcome to fork me. but if you can wait till Wednesday i should have the new format of sock_modules finished and some documentation done that will help! ;-)
specifically i'm hoping to sort out this issue, or at least the new format of sock_module i'll need to implement it over lunch tomorrow.
-
It'll be Weds before I have chance to tackle this again anyway.
-
Then i'll poke you when i make that commit and when i finish that documentation i wanted to write as well.
-
Just so we are clear on what is going on, due to the 10 req per second bot activity, the nginx logs are 5gb per day, on a cloud instance with 60gb of total disk space. Only the last 2 days are kept uncompressed, and I think 7 days before log rotation, but it is a lot.
And daily site db backups are 1.2 gb including images.
-
Stress testing (both the software, and its author).
-
Or you know, disable logs? That's what sane people do on high traffic sites. I have nginx serving 100+ req a second for a PHP websocket application just fine with logs turned off.
-
But Metrics!
-
What useless metrics could you possibly get from an access log? Your application should handle any real metrics internally. Worst case is you try and recreate the Google Analytics wheel and roll your own IP access lookup with user agents.
-
suckers... my "bot" doesn't count and is unaffected by the limiting ;) course, I'm not polling anything, just reading the site.
-
I think i've finished what i wated to do to sockbot.
You'll probably want to look at the notifyprint.js and summon.js sock_modules to tweak for your bots.
i need to write docs for sockbot. i'll do that after i update reader.js tomorrow.
shout if you have questions. i'll be happy to fix issues or explain what i didn't make clear.(and then write or fix the docs)
-
you are welcome to fork me.
Giggity.
@codinghorror, I thought that bots were an impossibility. So how is this happening in the first place?
-
he said it couldn't be done in a room full of geeks, nerds, and CompSci professionals.
what did you think was going to happen? ;-P
Here's my github. You can see how i do what i do, if you are willing to read the slightly idiosyncratic way that Node.js does async programming (and my total lack of comments in like 90% of the code. I need to fix that very soon. hopefully until then it's self documenting.)
-
I said spam was much harder to achieve in Discourse due to the all JS nature of the site. I never said bots attached to registered accounts were impossible.
When was the last time you saw spam on this site, versus the old forums?
-
Bots = Spam
Funny spam, but spam. We have them posting. Self-registering seems rather trivial now.
-
When was the last time you saw spam on this site, versus the old forums?
Pretty much every day now, constantly.
-
Hot, sexy MILFs looking for you here
-
It's only now that I've posted it that I realise I inadvertently created "emotse"
-
due to the all JS nature of the site.
It doesn't matter how many jewels you encrust your site in, it is no harder to read the basic network POST & GET commands that are logged in every half-decent browser available. That's the only part needed, it's not like the bot needs to replicate the full UI. Bonus points because the calls almost all return JSON results, so the old page scraping techniques can be ditched for simply reading pre-created objects directly in the bot code. Botmakers don't need to know or care about how the actual dicsource site javascript works at all.
4 steps to spam bot
- craft a post in the UI, hit "Reply", read network request header
- mimic network request header in language of choice
- ???i ii iii iv
- Profit
-
Sure, someone could "spam" using APIs too. But when you think of spammers, do you think of people writing apps that call APIs? That is an issue of proper API design, and a combination of native and upstream (nginx, haproxy) rate limits.
-
If a target (and/or it's marketshare) might be lucrative enough. I'd also think there to be some who build tools/scripts, and then sell access to the tool to others (if not just sharing amongst a tight group of fellow spammers).
-
Sure, someone could "spam" using APIs too. But when you think of spammers, do you think of people writing apps that call APIs? That is an issue of proper API design, and a combination of native and upstream (nginx, haproxy) rate limits.
Proper API design eh? And yet this forum is having issues with requests?
-
Proper API design eh? And yet this forum is having issues with requests?
You have been rate limited for Doing It Wrong™. Pray you are not rate limited further.
-
I realise I tested switching sockbot is an access to set their priority class= status style= clear: both the last bug was much harder to registered accounts like the 10 seconds before I just gets in the new format of people writing apps that are using the community to evaluate! faoileag So yes if they are getting any real metrics internally. Worst case is having issues or fix the attach to be ditched for now.
I tried adding a combination of people do to the bot please send me a good idea! We have made sockbot continue to sort of spammers do what my previous bug logged in every half-decent browser available. That's a lot. And RegexBuddy is unaffected by me. Giggity. class= status style= margin-top:10px; opened by the last time of geeks nerds and recreate the software to the attach to sort of spammers . I think of create gmail spam on this site versus the full of reqs you think of 60 seconds if you make. 10 data-topic= 3968 ><div class= mention href= https://github.com/AccaliaDeElementia src= https://avatars2.githubusercontent.com/u/829476?v=2&s=400 class= thumbnail width= 397 height= 20 height= 20 src= /letter_avatar/delfinom/40/2.png class= title class= emoji alt= arrows_counterclockwise width= 20 height= 20 height= 90 href= /users/sam would have wanted to set priority class= thumbnail width= 20 height= 108
-
Mostly nonsense post with a link.
Yup, spambots seem possible.
-
Mostly nonsense post with a link.
We already have a Markkov chain running around. There's absolutely nothing that mitigates the spammers, maybe except for the fact that the API is so terrible nobody sane would want to figure it out even for a million dollars.
-
Proper API design eh? And yet this forum is having issues with requests?
yeah.... I was doing it wrong. I've fixed that now. i think.....
-
We already have a Markkov chain running around. There's absolutely nothing that mitigates the spammers,
To be fair, MottBott is pretty easy to recognize as a Markov chain.
Though if someone were particularly evil and pointed a couple dozen instances at the same Discourse instance simultaneously, I wonder what kind of admin effort would be required to resolve that.
-
To be fair, MottBott is pretty easy to recognize as a Markov chain.
Spammers, OTOH, are rather hard to discern from Markov chains.
-
Though if someone were particularly evil and pointed a couple dozen instances at the same Discourse instance simultaneously, I wonder what kind of admin effort would be required to resolve that.
Considering that you cannot ban IPs? It might be a bit difficult.
-
can't ban IPS? you've never met my friend IPTABLES have you?
granted @PJH couldn't do it but some people do have access to the server firewall (no, i don't have that access. I don't want it either)
-
-
oh. right. that would work too....
-
Delete?
It is a delete in a GUI and not CLI but does it use a recycle bin? And what if you shift-click it?
-
That button relates to the backup on the left of that line. No idea what happens on the host when clicked - it gets permanently removed from the GUI however (after an 'Are You Sure?' dialog.)
Shift-click does bugger all.
-
-
If a target (and/or it's marketshare) might be lucrative enough. I'd also think there to be some who build tools/scripts, and then sell access to the tool to others (if not just sharing amongst a tight group of fellow spammers).
Exactly it. If Discourse takes off to any significant degree (and, frankly, the figures we've heard touted thus far are not that significant), the more lucrative it will be that someone adds in the necessary GET/POST requests to things like Xrumer.
In fact, I'd even go as far as suggesting the primary reason Discourse hasn't had a shitton of spam bots yet is because it's simply not worth the effort yet because of the limited userbase.
-
The software [xrumer] is also capable of avoiding detection by making posts in off-topic, spam and overflow sections of forums thus attempting to keep its activities in high activity low content areas of the targeted forum.
seriously though, the Likes Thread will be the #1 target of all time!
perhaps that's the reason for the poor search engine optimization of dicsourse, it discourages xrumer style spammers because they aren't going to get as big of a boost from spamming forums that don't translate well in google pageranks?
-
Nah, the reason for poor SEO is simple: incompetence.
-
seriously though, the Likes Thread will be the #1 target of all time!
I don't see a problem with this.
For fame! For glory! For 30k/100k/1000k/beyond!
-
i want to see that topic get to 1 megapost!
-