doujia5863 2015-05-12 07:41
浏览 26

防止机器人刮擦我们的内容/超载我们的服务器

We are planning to offer a job platform-service for some firms. We already have a few thousand jobs that we could offer to all our guest/visitors.

Since yesterday we noticed that our server-load is crazy and when we checked the logs we saw that we had multiple site-request per second from different IP addresses. However the order in which the pages were called indicate it was the same user / bot

We want to be available for the public but if bots are slowing our server massively down or forcing us for new hardware then we are in trouble.

We are currently displaying all our job-content in iframes, would an encoder like: http://www.tareeinternet.com/scripts/iframe-encoder/

help to solve our problem?

Or what options do we have? Its especially annoying since we don't have user-sessions or recurring IP-Addreses (I think they are using proxys that switch regulary)

  • 写回答

1条回答 默认 最新

  • douxianxing5712 2015-05-12 10:57
    关注

    Have you checked the headers for recurring data? If they, for example, have a recurring user-agent you can can block those:

    • Apache:

    SetEnvIfNoCase User-Agent "^Wget" bad_bot
    SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
    <Directory "/var/www">
            Order Allow,Deny
            Allow from all
            Deny from env=bad_bot
    </Directory>

    • Code: You can check each requests for that specific header and redirect it to somewhere else.
    </div>
    
    评论

报告相同问题?

悬赏问题

  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行