doujia2090 2014-04-15 13:07
浏览 81

如何阻止spyder / Nutch-2等爬虫访问特定页面?

I have a Windows client application that consumes a php page hosted in a shared commercial webserver.

In this php page I am returning an encrypted json. Also in this page I have a piece of code to keep track of which IPs are visiting this php page, and I have noticed that there is a spyder/Nutch-2 crawler visiting this page.

I am wandering how is possible that a crawler could find a page that is not published in any search engines. I there a way to block crawlers from visiting this specific page?

Shall I use .htaccess file to configure it?

  • 写回答

5条回答 默认 最新

  • douzhongqiu5032 2014-04-15 13:13
    关注

    You can indeed use a .htaccess. robots.txt is another option but some crawlers will ignore this. You can also block specific user agent strings. (They differ from crawler to crawler)

    robots.txt:

    User-agent: *
    Disallow: /
    

    This example tells all robots to stay out of the website: You can block specific directories

    Disallow: /demo/
    

    More information about robots.txt

    评论

报告相同问题?

悬赏问题

  • ¥15 Python爬取指定微博话题下的内容,保存为txt
  • ¥15 vue2登录调用后端接口如何实现
  • ¥65 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?