So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...
-
Being relatively new to the web hosting game, I can't say I've encountered this before. We were trying to figure out why one of our client's NodeBB instances kept going down.... when brought back up it would last a couple days... then a day... then less than a day... to the point that one time I brought it back up, and it immediately went down again (after I'd stepped out for lunch).
Turns out a bunch of web crawlers think the IP belongs to a certain Russian porn site with a focus on... backdoor... action. Now, a couple bots, I am pretty sure the site would handle, but it's frickin' BaiduBot, with it's ~10 requests per second, that caused the 404 log to blow up. I believe the database size ballooned to over 1GB by the time I got around to clearing it, since we log every single 404 request to NodeBB.
As an aside, I should probably not log every 404 that hits NodeBB... maybe a rolling log of the past week would be sufficient.
Amused, I started using
ufw
to block a bunch of /24 blocks:# ufw status Status: active To Action From -- ------ ---- Anywhere DENY 100.43.90.0/24 Anywhere DENY 100.43.85.0/24 Anywhere DENY 63.249.73.0/24 Anywhere DENY 141.8.143.0/24 Anywhere DENY 100.43.91.0/24 Anywhere DENY 63.243.252.0/24 Anywhere DENY 180.76.15.0/24
(Additional rules truncated)
Which eliminated 95% of the requests, but I was wondering what the best course of action was if this ever happened again?
... besides the obvious of course, which is to change IP addresses...
I also recall crawlers (in particular, BaiduBot) went apeshit indexing your new forum too, how did that ever get resolved...
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Now, a couple bots, I am pretty sure the site would handle, but it's frickin' BaiduBot, with it's ~10 requests per second, that caused the 404 log to blow up.
@boomzilla had to blackhole a huge list of IPs to keep our site up too. Anything you can do to help prevent this problem would be great. He can get you a longer list, though.
We also had to put in scripts to restart instances that get clogged with requests, and report who was connected to them, for fine tuning.
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
BaiduBot
is famously badly behaved..... at this point it's probably best to either severely ratelimit that bots useragent via firewall to something like one request a second, or block it outright.
-
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
that bots useragent
I seem to recall the useragent not playing nicely either. We IP-banned it.
-
@Yamikuronue said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
that bots useragent
I seem to recall the useragent not playing nicely either. We IP-banned it.
..... because OF COURSE it doesn't actually properly announce itself so webmasters can handle it correctly.
/me stomps off to the local glassblowers so she can spend an hour or so smashing rejects into cullet.
-
Yandex needs to learn to limit its crawlers to a specific IP block so I can block it once and be done with them
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Yandex needs to learn to limit its crawlers to a specific IP block so I can block it once and be done with them
if they did that they wouldn't be so famously misbehaved though.
-
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
is famously badly behaved..... at this point it's probably best to either severely ratelimit that bots useragent via firewall to something like one request a second, or block it outright.
I think the BaiduSpider bots are often reasonably well behaved. The real problem was from stuff that used FF useragents (or maybe some other browser looking thing) whose patters looked very bottish coming from Baidu IP blocks.
-
123.125.71.45 - - [31/Mar/2017:07:05:53 +0000] "GET /homemade/ HTTP/1.1" 404 9 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
-
@Yamikuronue said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
that bots useragent
I seem to recall the useragent not playing nicely either. We IP-banned it.
Hmm...
/me is considering obtaining that IP list to preemptively block BaiduBot from hitting her company's websites and bringing them down
-
@RaceProUK said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
/me is considering obtaining that IP list to preemptively block BaiduBot from hitting her company's websites and bringing them down
But what about that sweet sweet legitimate traffic from China?
-
@RaceProUK These are the Baidu IP blocks on our blacklist:
# Baidu bots deny 180.76.0.0/16; deny 202.46.32.0/19; deny 131.161.8.0/22; deny 220.181.0.0/16;
One problem is that there are so many of the spiders coming from various IP addresses.
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
But what about that sweet sweet legitimate traffic from China?
We don't get any, not that I've seen anyway.
@boomzilla said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@RaceProUK These are the Baidu IP blocks on our blacklist:
# Baidu bots deny 180.76.0.0/16; deny 202.46.32.0/19; deny 131.161.8.0/22; deny 220.181.0.0/16;
One problem is that there are so many of the spiders coming from various IP addresses.
Thanks :D
-
Perhaps someone should create a BaiduBot block list GitHub project?
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
We were trying to figure out why one of our client's NodeBB instances kept going down.... when brought back up it would last a couple days... then a day... then less than a day... to the point that one time I brought it back up, and it immediately went down again
The real solution would be to not build anything on NodeJS
FileUnder: if only I was joking
-
@TimeBandit said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
We were trying to figure out why one of our client's NodeBB instances kept going down.... when brought back up it would last a couple days... then a day... then less than a day... to the point that one time I brought it back up, and it immediately went down again
The real solution would be to not build anything on NodeJS
FileUnder: if only I was joking
so ruby is superior?!
fuck that !
-
@TimeBandit I know you're only trolling, but it really doesn't matter what the server tech is: if BaiduBot decides to flood it with a million requests a second, it's going down no matter what.
-
@TimeBandit Hey, it would've worked fine except Mongo was trying to load the set into memory... the set whose size exceeded both physical memory and swap size. That would happen with any database, I think, so it's moreso a resource issue than anything
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Mongo
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Mongo
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@TimeBandit Hey, it would've worked fine except Mongo was trying to load the set into memory... the set whose size exceeded both physical memory and swap size. That would happen with any database, I think, so it's moreso a resource issue than anything
Not for an insert it won't.
-
I don't think it's really baidu, just bad bots using baidu's useragent.
(Email scavengers, seo analysers, etc.)
-
@PleegWat said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Not for an insert it won't.
Get out of here with your logic.
-
-
If you have an apache or nginx forward proxy setup you shouldn't let requests where the hostname doesn't match the website reach node.
-
This post is deleted!
-
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@TimeBandit said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
The real solution would be to not build anything on NodeJS
FileUnder: if only I was joking
so ruby is superior?!
fuck that !
How exactly does the one follow from the other?
-
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Mongo
To this day I can not understand why they chose that name, in a quite a few languages such as german that is the informal name for down syndrom. Not exactly what I would want people to think of first when they hear the name of a really complex system...
-
@Quwertzuiopp said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
in a quite a few languages such as german that is the informal name for down syndrom
So, in retrospective, it was well chosen
-
@masonwheeler said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@TimeBandit said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
The real solution would be to not build anything on NodeJS
FileUnder: if only I was joking
so ruby is superior?!
fuck that !
How exactly does the one follow from the other?
you read the mouseover text, yes?
-
@accalia I did. Still, what you say only makes sense if there are no other alternatives at all.
-
@masonwheeler said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia I did. Still, what you say only makes sense if there are no other alternatives at all.
HYPERBOLEEEEEEEEEEEEEEEEE
:-P
-
@masonwheeler said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia I did. Still, what you say only makes sense if there are no other alternatives at all.
You advocate PHP hellstew?
-
@PleegWat said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@masonwheeler said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@accalia I did. Still, what you say only makes sense if there are no other alternatives at all.
You advocate PHP hellstew?
hellstew gets the job done..... at least until you taste heavensoup. after that nothing is quite the same ever again.
-
@PleegWat xenforo is good, and php based
-
@PleegWat What part of what I said implies that I'm advocating anything in particular?
-
@TimeBandit said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
We were trying to figure out why one of our client's NodeBB instances kept going down.... when brought back up it would last a couple days... then a day... then less than a day... to the point that one time I brought it back up, and it immediately went down again
The real solution would be to not build anything on NodeJS
FileUnder:
if only I was jokingStating the obvious
-
@boomzilla said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
The real problem was from stuff that used FF
It normally is
-
@Quwertzuiopp said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
@julianlam said in So Baidu/et al. think one of my clients' IPs belongs to a Russian porn site...:
Mongo
To this day I can not understand why they chose that name, in a quite a few languages such as german that is the informal name for down syndrom. Not exactly what I would want people to think of first when they hear the name of a really complex system...
humongous.