Justin Guitar Community

Tools of the Trade => Computer and OS => Topic started by: Scooter Trash on September 10, 2012, 07:30:06 am

Title: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 07:30:06 am
I have several guests from this IP range on my SMF forums.
Apparently they're spiders from a Chinese search engine.
Should I block the IP range in my htaccess?

inetnum:        180.76.0.0 - 180.76.255.255
netname:        Baidu
descr:          Beijing Baidu Netcom Science and Technology Co., Ltd.
descr:          Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country:        CN
admin-c:        WN141-AP
tech-c:         JC2179-AP
mnt-by:         MAINT-CNNIC-AP
mnt-lower:      MAINT-CNNIC-AP
mnt-routes:     MAINT-CNNIC-AP
status:         ALLOCATED PORTABLE
changed:        [email protected] 20090715
source:         APNIC

person:         Nan Wang
address:        Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country:        CN
phone:          +8610-59927164
fax-no:         +8610-62684273
e-mail:         [email protected]
nic-hdl:        WN141-AP
mnt-by:         MAINT-CNNIC-AP
changed:        [email protected] 20100322
source:         APNIC

person:         Jacky Chang
nic-hdl:        JC2179-AP
address:        10th Floor No.6 2nd North Street Haidian District Beijing,100080
country:        CN
phone:          +8610-82602288-7280
fax-no:         +8610-62684273
e-mail:         [email protected]
mnt-by:         MAINT-CNNIC-AP
changed:        [email protected] 20071227
source:         APNIC
Title: Re: Chinese search engine spidering my site.
Post by: welly_59 on September 10, 2012, 08:06:56 am
Why block them?
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 08:39:09 am
Why block them?

http://riskyinternet.com/what-is/ip/180.76.6.211/ (http://riskyinternet.com/what-is/ip/180.76.6.211/)
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 09:33:42 am
I just found this in the SMF forums:
http://www.simplemachines.org/community/index.php?topic=350439.0
I've blocked the IP range in my .htaccess


Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 10, 2012, 11:40:57 am
Well i was going to tell you that the IP range is wrong for the real Baidu spiders (currently, at least), and it may be wise to block it, but it looks like the folks from the SMF community already did a grand job at putting in the right warnings.
Personally, i block all coms from China, too much malicious traffic.
The ratio to genuine users is just too small.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 12:00:57 pm
Well i was going to tell you that the IP range is wrong for the real Baidu spiders (currently, at least), and it may be wise to block it, but it looks like the folks from the SMF community already did a grand job at putting in the right warnings.
Personally, i block all coms from China, too much malicious traffic.
The ratio to genuine users is just too small.

Thanks Dan,

Hopefully it'll keep 'em away... If I start seeing suspicious activity or multiple IPs from the same range again and they come back to China I'll just block the region.
Do you block them in your .htaccess? Or is there a better way to block an entire country on an Apache server? (commercially hosted)
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 10, 2012, 01:16:56 pm
I have a deal with my host; they block all traffic from China at a level i have no control over (since it's a dedicated server i suppose it's done at a router/gate level, not sure, but i can ask if you want), but if i had to block them manually i could have done it via several ways.
My server uses Nginx so i could use ngx_http_access_module, but that would require looking up all the IP ranges for China, so personally i'd install and use GeoIP to just block the entire country, like so :
Code: [Select]
if ($geoip_country_code ~ (CN) ) { return 403; }For Apache i'd also use GeoIP, or see if your host can do what mine did for me; block it at their level.
If you just want to block a few then i think .htaccess is the best way...
But you might want to consult old-and-in-the-way (http://justinguitarcommunity.com/index.php?action=profile;u=21458) about that, he might have some insights i lack.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 01:25:23 pm
I have a deal with my host; they block all traffic from China at a level i have no control over (since it's a dedicated server i suppose it's done at a router/gate level, not sure, but i can ask if you want), but if i had to block them manually i could have done it via several ways.
My server uses Nginx so i could use ngx_http_access_module, but that would require looking up all the IP ranges for China, so personally i'd install and use GeoIP to just block the entire country, like so :
Code: [Select]
if ($geoip_country_code ~ (CN) ) { return 403; }For Apache i'd also use GeoIP, or see if your host can do what mine did for me; block it at their level.
If you just want to block a few then i think .htaccess is the best way...
But you might want to consult user OldAndInTheWay about that, he might have some insights i lack.

Kewl.. Thanks, I'll talk to my host and check into GeoIP... No need to ask how they do it.. You're likely right about the router block, and because I don't have access to that it would only satisfy curiosity.. 

This site has IP ranges for different countries: http://www.proxyserverprivacy.com/ipaddress_range.php (http://www.proxyserverprivacy.com/ipaddress_range.php) I might just copy them and paste 'em into the .htaccess..
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 10, 2012, 02:12:37 pm
Be careful with doing that scooter, unlike GeoIP that site may not have 100% accurate ranges in it's lists.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 10, 2012, 02:24:41 pm
Be careful with doing that scooter, unlike GeoIP that site may not have 100% accurate ranges in it's lists.

Good point! I'd hate to block legit IP ranges and I don't have any desire to verify them all..
.. Sent a PM to old-and-in-the-way. Going to check out GeoIP now. If you're interested I'll post an update when I figure out what to do..
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 10, 2012, 02:44:33 pm
Well i am interested, but i'm also trying to process that piece you linked to in response to my question about pickups and perceived compression, so i might take a while to respond  ;)
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 11, 2012, 09:51:17 am
Well i am interested, but i'm also trying to process that piece you linked to in response to my question about pickups and perceived compression, so i might take a while to respond  ;)

LOL Dan,

I seem to have taken care of the Chinese data scrapers, and the Ukranian pill spammers seem to be under control for now. All I have left is legit users and Google bots that I allow for SEO so I'm just going to keep an eye on it and if I see more suspicious activity I'll dig in a bit deeper and bother old-and-in-the-way, or a friend who I just remembered is an IT professional.

Thanks again
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 11, 2012, 10:48:54 am
Don't forget to also keep allows for Bing and Yahoo in robots.txt, they do represent a rather significant slice of marketshare (when combined of course, on their own they are all rather negligible).
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 11, 2012, 11:19:32 am
Don't forget to also keep allows for Bing and Yahoo in robots.txt, they do represent a rather significant slice of marketshare (when combined of course, on their own they are all rather negligible).

Thanks, I have them allowed and use the Webmaster tools that each of them provide as well as some other SEO tools. I'm pretty easy find in the search engines.. but that seems to be both a blessing and a curse.
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 11, 2012, 11:38:23 am
Yeah, SEO is always a mixed blessing.
It brings wanted attention... And a LOT of unwanted attention included, free of charge  :-X
Speaking of SEO stuff, you know about http://www.selfseo.com/ ?
Has some handy tools.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 11, 2012, 12:12:37 pm
Speaking of SEO stuff, you know about http://www.selfseo.com/ ?
Has some handy tools.

Wow! That looks like a "one stop shopping" place for SEO. I bookmarked it. Thanks!
I have an old application called "Traffic Seeker" that I bought years ago and just upgraded.. It's not great, but it helps. I also use an SEO plugin for WordPress, and the Webmaster tools that I mentioned earlier. I still need to work on linkage, keywords, and content, but I don't have a sales-based Website so I don't worry a whole lot about it. My ranking isn't very high on Google, but most of my commonly used  keywords don't have a lot of competition so I come up at or near the top of the first page in most of the search engines... Anyhow, thanks for the link! When I get motivated to do some more SEO, I'll definitely use it!
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on September 11, 2012, 12:32:54 pm
This is the sort of thing where my time spent as a scriptkiddie comes in handy  ;D
Well, this and cleaning up after malware infections  8)
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on September 11, 2012, 01:27:12 pm
This is the sort of thing where my time spent as a scriptkiddie comes in handy  ;D
Well, this and cleaning up after malware infections  8)

I'm hoping that the people I'm blocking don't take it too personal... Most of them are bots, so I don't think they will..
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on February 17, 2015, 10:28:59 am
I seem to be digging up threads like it's going out of fashion, but what the heck...
Ever since i started running a blog on one of my Raspberry Pi's i've been running several anti-spam tools, and at some point last week the amount of crap that was flooding the poor little Pi was so bad i had to take it offline.
Now, since this blog is run off my second home connection, i obviously don't have a hosting provider who can blackhole all the bad traffic...
So i decided upon an experiment : locally blackholing China (including Hong Kong) and Russia via mod_geoip...
Spam incidence down to 1% of it's original volume.
Then looked up the IP's for the remaining spam traffic : all from Turkye and/or TOR.
Blackholed those as well : no more spam.

It's rather sad that one has to go to such measures to keep a clean blog, but it beats nuking the spammers from orbit, eh ?
Title: Re: Chinese search engine spidering my site.
Post by: m_c on February 18, 2015, 12:43:44 am
I've had spammer issues in the past, and blacklisting the IP addresses provided at the link below cured 99% of the problem. The list is regularly updated with known problem IPs, and minimises the resources needed to block questionable IPs.

http://www.wizcrafts.net/chinese-blocklist.html
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on February 18, 2015, 12:04:36 pm
I'm running Nginx, and so using mod_geoip, which really doesn't slow things down much at all (loadtime difference of about half a second), but thanks for the suggestion, i'll have a peek at that list.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on February 18, 2015, 12:19:23 pm
Ideas for best option on shared server? (Go Daddy)
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on February 18, 2015, 11:34:29 pm
I'd still say Nginx with mod_geoip scooter, although from what i hear the best idea is to first drop Godaddy like a ton of bricks...
YMMV.

@ m_c : I had a peek at the link, but if there's one thing i've learned from working with the .htaccess file, it's that if i add a load of ip adresses in there, it'll do worse things to site performance than mod_geoip, especially under extensive load.
Mind you, that's on a low-power system like my Raspberry Pi, i've never quite messed with it on proper servers, but that's because i'm used to using Nginx and PHP-FPM on systems where i have no need for further resource saving (dedicated servers).
Title: Re: Chinese search engine spidering my site.
Post by: m_c on February 19, 2015, 12:09:24 am
When I implemented the .htaccess option, it was the simplest workable option.
At the time, the server didn't natively support anything like mod_geoip, and implementing it would involve plugins and extra server resources. A list of blocked IPs in the .htaccess was an easy fix only needing occasional manual updates.

It's not something I've had to deal with lately, as I've not really been doing much webserver work, although I really need to get my own business website done at some point, when I finally decide what I'm going to use for an online shop.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on February 19, 2015, 05:38:28 am
I'd still say Nginx with mod_geoip scooter, although from what i hear the best idea is to first drop Godaddy like a ton of bricks...
YMMV.

Thanks Dan.

Like m_c, I've just been editing the .htaccess file and have been able to cut down on the data scrapers and pill spammers quite a bit. If I wasn't on a shared server, I'd block Russia, China, and Ukraine. I've had a decent experience with GoDaddy for about 4 years. I'll look into Nginx and the mod_geoip. Thanks again :)
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on February 19, 2015, 11:17:38 am
Not sure what you're running exactly scooter, but if it's Wordpress, you can locally stop the spammers from visiting your WP frontend (and backend) with the IQBlock plugin.
It relies on the same blocklists that mod_geoip uses, but unlike the more serverwide Nginx implementation, this runs just for the WP installation itself.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on February 19, 2015, 06:00:16 pm
Not sure what you're running exactly scooter, but if it's Wordpress, you can locally stop the spammers from visiting your WP frontend (and backend) with the IQBlock plugin.
It relies on the same blocklists that mod_geoip uses, but unlike the more serverwide Nginx implementation, this runs just for the WP installation itself.

I have WP and SMF and a few HTML pages.
I have plugins installed that seem to do a decent job of blocking comment spammers, etc. It's the data scrapers that I'm concerned about.. I don't really have sensitive info in the forums other than a Birthday thread for users that would allow people to get birth dates.. I'm not sure how secure the SMF database is on GoDaddy. I use SiteLock Secure, which did find some script in my theme which was developed by someone in Japan and was available on the WP theme site/page. I removed the malicious script, and the theme works fine, but now Google wants me to make it more mobile device friendly.. It's always something lol..
Title: Re: Chinese search engine spidering my site.
Post by: Dan Graves on February 19, 2015, 10:04:38 pm
Well if you want to keep scrapers out...
The carpetbombing approach is the only way to be sure.
It sucks, but the only way to keep them away with any sort of reliable results is to blacklist the countries where the worst offenders originate, and blocking all known proxies/vpn's that are friendly to that sort of thing.
Title: Re: Chinese search engine spidering my site.
Post by: Scooter Trash on February 19, 2015, 10:06:30 pm
Well if you want to keep scrapers out...
The carpetbombing approach is the only way to be sure.
It sucks, but the only way to keep them away with any sort of reliable results is to blacklist the countries where the worst offenders originate, and blocking all known proxies/vpn's that are friendly to that sort of thing.

Kewl.. Thanks again Dan.  8)