dongwen5870 2014-07-01 09:44
浏览 42
已采纳

使用.htaccess从子域中排除Crawler

I want to stop Crawler from crawling the subdomain tools.subdomain.com I found a Snippet on the Internet which show following Rewrite Rule:

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

How can i manage to block those Crawler on this subdomain, or just allow the current up to date Browser to visit the Subdomain? I Want to manage this through .htaccess, because not every crawler accepts the robots.txt. For the robots.txt i have following rewrite Condition.

RewriteCond %{HTTP_HOST} =testing.subdomain.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

Cheers

Sven

  • 写回答

1条回答 默认 最新

  • dongtao4890 2014-07-01 10:13
    关注

    It depends on your server layout.

    Segregated subdomain

    If the subdomain has its own document root, it's enough place an .htaccess file in the subdomain's document root and write the directives you specified in the htaccess file:

    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
    RewriteRule .* - [R=403,L]
    

    Shared subdomain

    If the subdomain is using the same document root as the toplevel domain, it's enough to add a RewriteCond to the above:

    RewriteCond %{HTTP_HOST} ^tools\.subdomain\.com$
    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
    RewriteRule .* - [R=403,L]
    

    Please note (1): the syntax ^tools\.subdomain\.com$ is needed to match exactly the entire name of the host; besides, since it's a regular expression, dots must be escaped with a backslash.

    Please note (2): the syntax of the last RewriteCond may vary according to the bots you want to exclude.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!