doudao1369 2013-02-15 11:11
浏览 622
已采纳

没有设置HTTP_USER_AGENT - 这是正常的吗? 或者可能是机器人?

I'm asking for your oppinion / experiences about this.

Our CMS is fetching info from the HTTP_USER_AGENT string. Recently we have discovered a bug in the code - forgot to check if HTTP_USER_AGENT is present (which is possible, but honestly: we simply skipped that, didn't expected that to happen) or not - these cases resulted in an error. So we have corrected it, and installed a tracking there: if HTTP_USER_AGENT is not set an alert is sent to our tracking system.

Now we have data/statistics from many websites from the past months. Now our stats show this is really rare. ~ 0.05-0.1%

Another interesting observation: these requests are single. Didn't find any case where this "user" has multiple pageviews in the same session...

This forced us thinking... Should we treat these requests as robots? And simply block them out... Or that would be a serious mistake?
Googlebot and other "good robots" are always sending HTTP_USER_AGENT info.

I know it is possible that firewalls or proxy servers MAY alter (or remove) this user-agent info. But according to our stats I can not clarify this...

What are your experiences? Is here anyone else who made any research about this topic?

Other posts I found on stackoverflow are simply accepting the fact "it is possible this info is not sent". But why don't we question that for a moment? Is it really normal??

  • 写回答

2条回答 默认 最新

  • doupafu6980 2013-02-15 11:35
    关注

    I would consider the lack of user-agent abnormal for genuine users, however it is still a [rare] possibility which may be caused by a firewall, proxy or privacy software stripping the user-agent.

    A request missing a user-agent is most likely a bot or script (not necessarily a search engine crawler). Although you can't say for sure of course.

    Other factors that may indicate a bot/script:

    • Only requesting the page itself, the failure to request resources on the page such as images, CSS and Javascript
    • A very short space of time between requests from page-page (such as within the same second).
    • The failure to send cookies or session IDs on subsequent requests where a cookie should have been set, but keep in mind genuine users may have cookies disabled.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 我想在一个软件里添加一个优惠弹窗,应该怎么写代码
  • ¥15 fluent的在模拟压强时使用希望得到一些建议
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流