没有设置HTTP_USER_AGENT - 这是正常的吗？或者可能是机器人？

I'm asking for your oppinion / experiences about this.

Our CMS is fetching info from the HTTP_USER_AGENT string. Recently we have discovered a bug in the code - forgot to check if HTTP_USER_AGENT is present (which is possible, but honestly: we simply skipped that, didn't expected that to happen) or not - these cases resulted in an error. So we have corrected it, and installed a tracking there: if HTTP_USER_AGENT is not set an alert is sent to our tracking system.

Now we have data/statistics from many websites from the past months. Now our stats show this is really rare. ~ 0.05-0.1%

Another interesting observation: these requests are single. Didn't find any case where this "user" has multiple pageviews in the same session...

This forced us thinking... Should we treat these requests as robots? And simply block them out... Or that would be a serious mistake?
Googlebot and other "good robots" are always sending HTTP_USER_AGENT info.

I know it is possible that firewalls or proxy servers MAY alter (or remove) this user-agent info. But according to our stats I can not clarify this...

What are your experiences? Is here anyone else who made any research about this topic?

Other posts I found on stackoverflow are simply accepting the fact "it is possible this info is not sent". But why don't we question that for a moment? Is it really normal??

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doupafu6980 2013-02-15 11:35
关注
I would consider the lack of user-agent abnormal for genuine users, however it is still a [rare] possibility which may be caused by a firewall, proxy or privacy software stripping the user-agent.

A request missing a user-agent is most likely a bot or script (not necessarily a search engine crawler). Although you can't say for sure of course.

Other factors that may indicate a bot/script:

Only requesting the page itself, the failure to request resources on the page such as images, CSS and Javascript

A very short space of time between requests from page-page (such as within the same second).

The failure to send cookies or session IDs on subsequent requests where a cookie should have been set, but keep in mind genuine users may have cookies disabled.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

没有设置HTTP_USER_AGENT - 这是正常的吗？或者可能是机器人？ php
2013-02-15 11:11

回答 2 已采纳 I would consider the lack of user-agent abnormal for genuine users, however it is still a [rare] p
php获取$_SERVER['HTTP_USER_AGENT']是空的，为什么？ php
2019-08-19 11:46

回答 2 已采纳客户端的user-agent设为空，服务器端的PHP自然就获取不到了。
爬虫出现cannot import name 'UserAgent' from 'fake_user_agent',求解？ pycharm python 爬虫
2021-08-09 09:29

回答 1 已采纳确定导入了吗？最后那张图里，点“+”，输入fake-useragent，再点导入。
获取user agent php,ThinkPHP5轻松识别客户端信息User-Agent（获取用户的操作系统、浏览器信息）...
2021-03-24 10:34

燕麦麦的博客在使用浏览器发起的 HTTP 请求中，通常会包含一个识别标识。...在 PHP 中查看客户端 UA 标识的方式是读取系统常量 $_SERVER 中的 HTTP_USER_AGENT 选项：————————————————echo $_SERVER['HTTP_US...
未定义的索引：HTTP_USER_AGENT php
2018-08-25 19:07

回答 1 已采纳 You simply need to check the existence of the index (HTTP_USER_AGENT) on $_SERVER and if it's not
PHP - 检查IE，了解$ _SERVER ['HTTP_USER_AGENT'] [重复] php
2015-10-28 20:18

回答 1 已采纳 IE 11 no longer uses MSIE in the user agent string, it is also bad practice to detect for a browse
发现它的浏览器是IE使用HTTP_USER_AGENT php
2014-03-24 10:32

回答 1 已采纳 Trident is the rendering engine of IE – so whenever that partial string shows up, you can assume t
php 判断是否是机器人,php实现判断访问来路是否为搜索引擎机器人的方法_PHP
2021-04-22 16:40

亲爱的薄荷绿的博客本文实例讲述了php实现判断访问来路是否为搜索引擎机器人的方法。...php判断方法非常简单，通过过滤$_SERVER['HTTP_USER_AGENT'] 参数即可进行识别，以下是摘录某开源程序的相关源码：private function getRobot...
如何实现每次请求更换user_agent？ python
2020-03-15 17:24

回答 2 已采纳当程序运行到 ``` headers = {'User-Agent': random.choice(user_agent)} ``` _这里时，choice函数会随机挑选一个请求头，将它赋值给
谷歌浏览器抓包请求头中的sec-ch-ua是什么意思? html5 python
2021-03-01 17:45

回答 2 已采纳 sec-ch-ua可以理解用来替代user-agent的，用sec-ch-ua可以防止泄露浏览器详细信息
微信小程序修改User-Agent html5
2021-05-15 16:26

回答 1 已采纳不行的，这个微信内置的头。开发工具你可以用XN Resource Editor看下是否有对应字符串，有的话就修改。手机的就没搞了。。
CrawlerDetect：:spider:CrawlerDetect是一个PHP类，用于通过用户代理检测botscrawlersspiders
2021-02-04 17:53

关于CrawlerDetect CrawlerDetect是一个PHP类，用于通过用户代理和http_from标头检测bot /爬虫/蜘蛛。目前能够检测出1,000个机器人/蜘蛛/...// Check the user agent of the current 'visitor'if ( $ CrawlerDetect
能否通过User-Agent来判断请求信息是否来自于同一个客户端？ java 服务器
2021-12-22 11:17

回答 2 已采纳登录成功后写一个唯一标识到客户端浏览器cookie，每次请求效验下，或者带上，服务器去标识
【fake_useragent】网络爬虫获取随机User-Agent
2022-06-05 16:55

黄昏中起飞的猫头鹰的博客目录User-aent安装fake_useragent库导入模块生成useragent字符串request模块使用示例：user-agent：检验请求者的用户代理，包含请求者的浏览器、操作系统版本和cpu等信息，以此来判断是否为机器人。cmd命令行中输入...
php 判断是否是机器人,如何用PHP检测搜索引擎机器人？
2021-04-22 16:40

宁予尘的博客检查下面$_SERVER['HTTP_USER_AGENT']列出的一些字符串：或者更具体地说是爬虫：如果您想 - 记录大多数常见搜索引擎抓取工具的访问次数，您可以使用$interestingCrawlers=array('google','yahoo');$pattern='/('....
没有解决我的问题, 去提问

悬赏问题

¥20 删除和修改功能无法调用
¥15 kafka topic 所有分副本数修改
¥15 小程序中fit格式等运动数据文件怎样实现可视化？（包含心率信息））
¥15 如何利用mmdetection3d中的get_flops.py文件计算fcos3d方法的flops？
¥40 串口调试助手打开串口后,keil5的代码就停止了
¥15 电脑最近经常蓝屏，求大家看看哪的问题
¥60 高价有偿求java辅导。工程量较大，价格你定，联系确定辅导后将采纳你的答案。希望能给出完整详细代码，并能解释回答我关于代码的疑问疑问，代码要求如下，联系我会发文档
¥50 C++五子棋AI程序编写
¥30 求安卓设备利用一个typeC接口，同时实现向pc一边投屏一边上传数据的解决方案。
¥15 SQL Server analysis services 服务安装失败

没有设置HTTP_USER_AGENT - 这是正常的吗？ 或者可能是机器人？

2条回答 默认 最新

悬赏问题

没有设置HTTP_USER_AGENT - 这是正常的吗？或者可能是机器人？

2条回答默认最新