dongwen5870 2014-07-01 09:44
浏览 42
已采纳

使用.htaccess从子域中排除Crawler

I want to stop Crawler from crawling the subdomain tools.subdomain.com I found a Snippet on the Internet which show following Rewrite Rule:

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

How can i manage to block those Crawler on this subdomain, or just allow the current up to date Browser to visit the Subdomain? I Want to manage this through .htaccess, because not every crawler accepts the robots.txt. For the robots.txt i have following rewrite Condition.

RewriteCond %{HTTP_HOST} =testing.subdomain.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

Cheers

Sven

  • 写回答

1条回答 默认 最新

  • dongtao4890 2014-07-01 10:13
    关注

    It depends on your server layout.

    Segregated subdomain

    If the subdomain has its own document root, it's enough place an .htaccess file in the subdomain's document root and write the directives you specified in the htaccess file:

    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
    RewriteRule .* - [R=403,L]
    

    Shared subdomain

    If the subdomain is using the same document root as the toplevel domain, it's enough to add a RewriteCond to the above:

    RewriteCond %{HTTP_HOST} ^tools\.subdomain\.com$
    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
    RewriteRule .* - [R=403,L]
    

    Please note (1): the syntax ^tools\.subdomain\.com$ is needed to match exactly the entire name of the host; besides, since it's a regular expression, dots must be escaped with a backslash.

    Please note (2): the syntax of the last RewriteCond may vary according to the bots you want to exclude.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么