如何检测社交媒体巨头机器人和精炼php中的useragent？

I am trying to build the script that will capture the USER-AGENT of the users.That can easily be done using $_SERVER['HTTP_USER_AGENT']

example: Below are all the twitter Bots that detect by $_SERVER['HTTP_USER_AGENT']

I just simple post the link of php script on twitter and it detect the bots:

Here are the Bots thats Captured by HTTP_USER_AGENT of twitter network.

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/52.0

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)

Mozilla/5.0 (compatible; AhrefsBot/6.1; News; +http://ahrefs.com/robot/)

Mozilla/5.0 (compatible; TrendsmapResolver/0.1)

5 (Not sure its bot or Normal Agent)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

Twitterbot/1.0

 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)

Now I want to Refine/filter the Bots name from the detected HTTP_USER_AGENT

example:

rv:1.9.1.2
Trident/4.0
(compatible; AhrefsBot/6.1; News; +http://ahrefs.com/robot/)
(compatible; TrendsmapResolver/0.1)
Twitterbot/1.0
(Applebot/0.1; +http://www.apple.com/go/applebot)

What I have tried so far:

if (
    strpos($_SERVER["HTTP_USER_AGENT"], "Twitterbot/1.0") !== false ||          
    strpos($_SERVER["HTTP_USER_AGENT"], "Applebot/0.1") !== false
) {
    $file =fopen("crawl.txt","a");
    fwrite($file,"TW-bot detected.
");
    echo "TW-bot detected.";
}
else {
     $file =fopen("crawl.txt","a");
    fwrite($file,"Nothing found.
");
    echo "Nothing";
}

But somehow the above code is not working.let me know where I am getting wrong and in the crawl.txt always shows Nothing found let me know the proper/better/best way to detect bots or any direction or guidence is apprecheated.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongtangxi1584 2019-02-19 19:38
关注
You might find that its easy to spot the bots which capture simple website previews, but the user-agents of bots which scrape for restricted content are a lot more difficult.

You'd have to do more than just parse the UA. Interrogating the REMOTE_ADDR will be necessary also. You'd fire each request through something like http://ip-api.com to determine if its coming from a datacenter. Be careful of users with proxies, they will trigger false positives. You could go further and investigate the browser capabilities with Javascript, but be aware this is a difficult problem and its a constant arms-race between a providers detection tools and (usually) black-hat advertisers.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP获取网页源带UserAgent请求 php 有问必答
2022-03-07 21:49

回答 2 已采纳题主的user-agent没看出是移动端的，换下面的其中一个试试Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit
我如何解析php中的phantom.js输出？ html php
2019-01-22 14:39

回答 1 已采纳 PHP's exec does not return an output. It fills in the second argument with command's output, like
Android：SetRequestProperty userAgent？ android http php
2015-11-09 16:40

回答 1 已采纳 Don't set it, Android does this for you. Well, you can customise it to your liking using the meth
微信公众号自动回复聊天机器人实现（PHP）
2019-01-13 22:12

建人尹口的博客在“微信公众平台”上将“服务器配置”启用，填写“服务器地址”（即关注者输入消息后调用的回调接口，返回回复文本），“令牌”（在接口验证中要用到）， “消息加解密密钥”、加解密方式暂时设为明文模式，先简单...
如何使用php中的curl将文件从本地上传到服务器？ php
2019-02-11 08:23

回答 1 已采纳 I think you miss important information. fopen ("user.com/user/fromLocal.txt", 'w+') this means
通过php curl得取淘宝单个价格和特征资料 php
2019-11-14 17:45

回答 1 已采纳 https://blog.csdn.net/m0_37683054/article/details/76101048
如何正确检查用户代理在php中是否为空 php
2019-07-12 16:12

回答 1 已采纳 Your test for an empty user agent looks fine to me. In the second code, there's no need to check
如何给urllib.request.urlretrieve 添加UserAgent?
2018-09-13 15:26

圈圈山顶洞人的博客使用urllib.request.urlretrieve，有时需要添加UserAgent,这里提供一种变通的方法： opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Opera/9.80 (Android 2.3.4; Linux; Opera ...
PHP 神奇的问题 CURL访问微信接口报433错误。 php
2022-05-02 16:47

回答 2 已采纳开启ssl拓展了吗
ajax表单提交到php文件和url？ ajax javascript jquery php
2016-02-01 08:18

回答 1 已采纳 If I were you I'll just duplicate the current form with different ID and action pointing to a diff
php函数url调用框架 http和https https php
2018-06-13 06:07

回答 1 已采纳 $_SERVER['SERVER_PORT'] 是获取你本地服务器的，不是远程的。。你远程地址支持什么协议不直接写死就行了。。。你判断你本地服务器的干嘛？
php的curl携带header请求头信息实现http访问
2022-02-21 14:24

代元培的博客 curl请求时添加请求头信息可以模拟真人操作，不容易被当成是爬虫机器人（采集），从而可以绕过Incapsula等安全验证机制。
PHP Curl在浏览器中返回不同的URL结果 php
2018-06-01 11:31

回答 2 已采纳 Because JavaScript is the root of all evil. the website gets the search results you want with AJAX
php curl couldn't resolve host,php中CURL报Could not resolve host错误
2021-03-26 13:28

彼一暝的博客 1、如果你是将你的上面代码放到服务器运行的话，百度的服务器可能会识别你USERAGENT为机器人robot，拒绝你的访问2、file_get_contents()某些时候是可行的，但是有些特殊情况也可以使用curl库的函数，为了防止对方...
爬虫、蜘蛛、机器人有什么区别？
2018-02-08 12:45

微wx笑的博客现在已经记不清了，大概是当时在做网站流量统计的时候，由于记录的网站用户请求的UserAgent内容，所以在访问记录中可以看到它留下的足迹。网络蜘蛛即Web Spider，是一个很形象的名字。把互联网比喻成一个蜘蛛网，...
PHP 轻量级的PHP类检测移动设备(包括平板电脑)。
2017-04-18 16:11

Qimi_的博客 Mobile Detect 是一个轻量级的PHP类，用于检测移动设备（包括平板电脑）。它使用与特定HTTP头相结合的用户代理字符串来检测移动环境。
PHPcurl请求实例
2021-05-13 13:18

哈尔滨洛弘科技有限公司的博客 ?php namespace rely\curl; use rely\Facade; use rely\Init; /** * Class Driver * @package rely\curl * @author Mr.taochuang <mr_taochuang@163.com> * @date 2019/7/3 11:14 * curl驱动类 */ ...
PHP截取两个字符中间内容,php获取文章链接,php的curl模拟请求，php的随机字符，php的获取IP，php的生成随机单号
2023-07-13 07:34

哈尔滨洛弘科技有限公司的博客 PHP截取两个字符中间内容,php获取文章链接,php的curl模拟请求，php的随机字符，php的获取IP，php的生成随机单号
火山平台 php_火山小视频无水印下载php源码
2021-03-22 18:56

爱健身的煜妹的博客 curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'); $content = curl_exec($ch); curl_close($ch); $...
PHP解析HTTP_USER_AGENT 获取客户端浏览器以及版本号
2019-05-15 17:39

青崖林夕的博客 IE 各个版本典型的userAgent如下： Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2) Mozilla/4.0 (compatible; MSIE 6.0; Windo...
没有解决我的问题, 去提问

悬赏问题

¥15 多址通信方式的抗噪声性能和系统容量对比
¥15 winform的chart曲线生成时有凸起
¥15 msix packaging tool打包问题
¥15 finalshell节点的搭建代码和那个端口代码教程
¥15 用hfss做微带贴片阵列天线的时候分析设置有问题
¥15 Centos / PETSc / PETGEM
¥15 centos7.9 IPv6端口telnet和端口监控问题
¥20 完全没有学习过GAN，看了CSDN的一篇文章，里面有代码但是完全不知道如何操作
¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
¥20 海浪数据南海地区海况数据，波浪数据

如何检测社交媒体巨头机器人和精炼php中的useragent？

1条回答 默认 最新

悬赏问题

1条回答默认最新