doutan2111 2013-07-30 08:41
浏览 62

我如何只允许抓取工具访问网站的一部分?

I've got an ajax rich website which has extensive _escaped_fragment_ portions for Ajax indexing. While all my _escaped_fragment_ urls do 301 redirects to a special module which then outputs the HTML snapshots the crawlers need (i.e. mysite.com/#!/content redirects to mysite.com/?_escaped_fragment_=/content which in turn 301s to mysite.com/raw/content), I'm somewhat afraid of users stumbling on those "raw" URLs themselves and making them appear in search engines.

In PHP, how do I make sure only robots can access this part of the website? (much like StackOverflow disallows its sitemap to normal users, and only lets robots access it)

  • 写回答

1条回答 默认 最新

  • dongyi2889 2013-07-30 08:55
    关注

    You can't, at least not reliably.

    robots.txt asks spiders to keep out of parts of a site, but there is no equivalent for regular user agents.

    The closest you could come would be to try to keep a whitelist of acceptable ip addresses or user agents and serve different content based on that … but that risks false positives.

    Personally, I'd stop catering for old-IE, scrap the #! URIs and the escaped_fragment hack, switch to using pushState and friends, and have the server build the initial view for any given page.

    评论

报告相同问题?