Author Topic: Chinese search engine spidering my site.  (Read 3407 times)

0 Members and 1 Guest are viewing this topic.

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Chinese search engine spidering my site.
« on: September 10, 2012, 07:30:06 am »
I have several guests from this IP range on my SMF forums.
Apparently they're spiders from a Chinese search engine.
Should I block the IP range in my htaccess?

inetnum:        180.76.0.0 - 180.76.255.255
netname:        Baidu
descr:          Beijing Baidu Netcom Science and Technology Co., Ltd.
descr:          Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country:        CN
admin-c:        WN141-AP
tech-c:         JC2179-AP
mnt-by:         MAINT-CNNIC-AP
mnt-lower:      MAINT-CNNIC-AP
mnt-routes:     MAINT-CNNIC-AP
status:         ALLOCATED PORTABLE
changed:        [email protected] 20090715
source:         APNIC

person:         Nan Wang
address:        Baidu Plaza, No.10, Shangdi 10th street,Haidian District Beijing,100080
country:        CN
phone:          +8610-59927164
fax-no:         +8610-62684273
e-mail:         [email protected]
nic-hdl:        WN141-AP
mnt-by:         MAINT-CNNIC-AP
changed:        [email protected] 20100322
source:         APNIC

person:         Jacky Chang
nic-hdl:        JC2179-AP
address:        10th Floor No.6 2nd North Street Haidian District Beijing,100080
country:        CN
phone:          +8610-82602288-7280
fax-no:         +8610-62684273
e-mail:         [email protected]
mnt-by:         MAINT-CNNIC-AP
changed:        [email protected] 20071227
source:         APNIC
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

welly_59

  • Guest
Re: Chinese search engine spidering my site.
« Reply #1 on: September 10, 2012, 08:06:56 am »
Why block them?

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #3 on: September 10, 2012, 09:33:42 am »
I just found this in the SMF forums:
http://www.simplemachines.org/community/index.php?topic=350439.0
I've blocked the IP range in my .htaccess


I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #4 on: September 10, 2012, 11:40:57 am »
Well i was going to tell you that the IP range is wrong for the real Baidu spiders (currently, at least), and it may be wise to block it, but it looks like the folks from the SMF community already did a grand job at putting in the right warnings.
Personally, i block all coms from China, too much malicious traffic.
The ratio to genuine users is just too small.
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #5 on: September 10, 2012, 12:00:57 pm »
Well i was going to tell you that the IP range is wrong for the real Baidu spiders (currently, at least), and it may be wise to block it, but it looks like the folks from the SMF community already did a grand job at putting in the right warnings.
Personally, i block all coms from China, too much malicious traffic.
The ratio to genuine users is just too small.

Thanks Dan,

Hopefully it'll keep 'em away... If I start seeing suspicious activity or multiple IPs from the same range again and they come back to China I'll just block the region.
Do you block them in your .htaccess? Or is there a better way to block an entire country on an Apache server? (commercially hosted)
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #6 on: September 10, 2012, 01:16:56 pm »
I have a deal with my host; they block all traffic from China at a level i have no control over (since it's a dedicated server i suppose it's done at a router/gate level, not sure, but i can ask if you want), but if i had to block them manually i could have done it via several ways.
My server uses Nginx so i could use ngx_http_access_module, but that would require looking up all the IP ranges for China, so personally i'd install and use GeoIP to just block the entire country, like so :
Code: [Select]
if ($geoip_country_code ~ (CN) ) { return 403; }For Apache i'd also use GeoIP, or see if your host can do what mine did for me; block it at their level.
If you just want to block a few then i think .htaccess is the best way...
But you might want to consult old-and-in-the-way about that, he might have some insights i lack.
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #7 on: September 10, 2012, 01:25:23 pm »
I have a deal with my host; they block all traffic from China at a level i have no control over (since it's a dedicated server i suppose it's done at a router/gate level, not sure, but i can ask if you want), but if i had to block them manually i could have done it via several ways.
My server uses Nginx so i could use ngx_http_access_module, but that would require looking up all the IP ranges for China, so personally i'd install and use GeoIP to just block the entire country, like so :
Code: [Select]
if ($geoip_country_code ~ (CN) ) { return 403; }For Apache i'd also use GeoIP, or see if your host can do what mine did for me; block it at their level.
If you just want to block a few then i think .htaccess is the best way...
But you might want to consult user OldAndInTheWay about that, he might have some insights i lack.

Kewl.. Thanks, I'll talk to my host and check into GeoIP... No need to ask how they do it.. You're likely right about the router block, and because I don't have access to that it would only satisfy curiosity.. 

This site has IP ranges for different countries: http://www.proxyserverprivacy.com/ipaddress_range.php I might just copy them and paste 'em into the .htaccess..
« Last Edit: September 10, 2012, 01:53:44 pm by Scooter Trash »
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #8 on: September 10, 2012, 02:12:37 pm »
Be careful with doing that scooter, unlike GeoIP that site may not have 100% accurate ranges in it's lists.
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #9 on: September 10, 2012, 02:24:41 pm »
Be careful with doing that scooter, unlike GeoIP that site may not have 100% accurate ranges in it's lists.

Good point! I'd hate to block legit IP ranges and I don't have any desire to verify them all..
.. Sent a PM to old-and-in-the-way. Going to check out GeoIP now. If you're interested I'll post an update when I figure out what to do..
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #10 on: September 10, 2012, 02:44:33 pm »
Well i am interested, but i'm also trying to process that piece you linked to in response to my question about pickups and perceived compression, so i might take a while to respond  ;)
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #11 on: September 11, 2012, 09:51:17 am »
Well i am interested, but i'm also trying to process that piece you linked to in response to my question about pickups and perceived compression, so i might take a while to respond  ;)

LOL Dan,

I seem to have taken care of the Chinese data scrapers, and the Ukranian pill spammers seem to be under control for now. All I have left is legit users and Google bots that I allow for SEO so I'm just going to keep an eye on it and if I see more suspicious activity I'll dig in a bit deeper and bother old-and-in-the-way, or a friend who I just remembered is an IT professional.

Thanks again
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #12 on: September 11, 2012, 10:48:54 am »
Don't forget to also keep allows for Bing and Yahoo in robots.txt, they do represent a rather significant slice of marketshare (when combined of course, on their own they are all rather negligible).
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

Offline Scooter Trash

  • Stadium Superstar
  • ******
  • Posts: 3041
  • Good Vibes 80
Re: Chinese search engine spidering my site.
« Reply #13 on: September 11, 2012, 11:19:32 am »
Don't forget to also keep allows for Bing and Yahoo in robots.txt, they do represent a rather significant slice of marketshare (when combined of course, on their own they are all rather negligible).

Thanks, I have them allowed and use the Webmaster tools that each of them provide as well as some other SEO tools. I'm pretty easy find in the search engines.. but that seems to be both a blessing and a curse.
I dream of a better tomorrow where chickens can cross roads without their motives being questioned.

Offline Dan Graves

  • All Time Legend
  • *******
  • Posts: 6583
  • Good Vibes 170
  • Is on the Outside, looking in
Re: Chinese search engine spidering my site.
« Reply #14 on: September 11, 2012, 11:38:23 am »
Yeah, SEO is always a mixed blessing.
It brings wanted attention... And a LOT of unwanted attention included, free of charge  :-X
Speaking of SEO stuff, you know about http://www.selfseo.com/ ?
Has some handy tools.
"You need a little bit of insanity to do great things"
--Henry Rollins

(If you need me for something, PM ME FOR FSM'S SAKE ! I'm not around a lot, and I do NOT have thread notifications on!)

 

Get The Forum As A Mobile App