为什么有些网站不可擦除？

I have just started to learn how to use regular expressions to extract data from websites. The first goal of mine is to extract the title of a website. Here is what my code is like:

<?php 
    $data = file_get_contents('http://bctia.org');
    $regex = '/<title>(.+?)<\/title>/';
    preg_match($regex,$data,$match);
    var_dump($match); 
?>

The result of var_dump is empty:

array(0) { }

At first I thought, "maybe bctia.org does not have a title"? However, this is not the case, as I have checked the source of bctia.org, and it does have content between <title> and </title>.

Then I thought, maybe my code does not work? However, this is not the case either, as I have substituted bctia.org with other websites, say, bing.com, or apple.com, and they both returned correct results. For example, with apple.com I get the correct result

array(2) { [0]=> string(20) "" [1]=> string(5) "Apple" }

So I have to come to the conclusion that bctia.org is a very special website that prevents me from extracting its title...

I am wondering if that is actually the case? Or maybe my code has some problems that I have not identified?

Thank you in advance!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

doupang4126 2013-06-12 00:56

关注

This specific website's server-side code assumes that the client sends a User-Agent header, and apparently, your PHP installation is not configured to send one. So a 500 Internal Server Error is returned, causing file_get_contents to return false.

Source Error:
Line 66: //LOAD: Compatibility Mode
Line 67: //<meta http-equiv="X-UA-Compatible" content="IE=7,IE=9" />
Line 68: string BrowserOS = Request.ServerVariables["HTTP_USER_AGENT"].ToString();
Line 69: HtmlMeta compMode = new HtmlMeta();
Line 70: compMode.Content = "IE=7,IE=9";


Source File: c:\inetpub\wwwroot\BCTIA\Website\bctia\layouts\Main Layout.aspx.cs   
Line: 68

Stack Trace:
[NullReferenceException: Object reference not set to an instance of an object.]
   Layouts.Main_Layout.Page_Load(Object sender, EventArgs e) in c:\inetpub\wwwroot\BCTIA\Website\bctia\layouts\Main Layout.aspx.cs:68
   System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e) +24
   System.Web.UI.Control.LoadRecursive() +70
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +3063

To work around this issue, you can just set a user-agent string before making the request:

ini_set('user_agent', 'Mozilla/5.0 (compatible; Examplebot/0.1; +http://www.example.com/bot.html)');

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(2条)

报告相同问题？

关注问题

为什么有些网站不可擦除？ php
2013-06-12 00:11

回答 3 已采纳 This specific website's server-side code assumes that the client sends a User-Agent header, and ap
单片机程序和eeprom数据单独烧写会导致eeprom被擦除？ c语言 stm32 单片机
2022-03-02 09:44

回答 2 已采纳解决方法是File下，project properties下，picket3下，有两个选项，可以切换成另一个内存设置，然后可选择不擦除EEPROM。或者直接在主函数外写__EEPROM_DATA(1,
怎样擦除已绘制的path？要鼠标控制擦除，不能清空 python qt ui 有问必答
2022-05-19 11:49

回答 3 已采纳提供两个思路：方法一：点击擦除按钮后，再鼠标移动槽函数中，记录鼠标移动的点，并判断点是否再已绘的path中，如果在就把原来的path分割成两个path，并把点从新path中删掉，然后update重绘。
php array_merge 空数组,关于数组：php array_merge没有擦除值？
2021-04-21 15:29

小鱼拉姆的博客背景：Trevor正在使用标准算法的PHP实现：获取一组主要的默认名称 - 值对，并更新这些名称 - 值对，但仅限于那些实际存在有效更新值的名称 - 值对。问题：默认情况下，PHP array_merge的工作方式如下......它将使用...
cocos擦除间断怎么处理？ cocos2d 游戏引擎游戏程序
2023-03-29 13:54

回答 1 已采纳 1.将绘制过程放在update中2.每次滑动记录和上一次点的距离如果大于n 往中间补点
在使用之前擦除$ _SESSION是否合适？ php
2014-07-25 02:34

回答 1 已采纳 Nope. In fact, its really bad and your example code will render your sessions useless. When you c
怎样理解java泛型中的擦除 java
2017-08-20 09:22

回答 6 已采纳【1】class Derived2 extends GenericBase {} //这里去除了T，为什么编译时没有问题？经过测试这种继承方法，是不继承父类泛型效果的，，可能就是题主所说的对父类的
html canvas php,html中canvas有什么用？
2021-03-23 20:32

贤娃的博客在HTML5中元素可以为你提供一种使用 JavaScript来绘制图形的简单而强大的方法。它可以用于绘制图形，合成制作照片或做简单(而不是那么简单)的动画。是一个简单的元素，它只有两个特定属性“width”和“height”以及...
Java泛型擦除有点懵 java
2020-11-19 10:57

回答 1 已采纳这个跟泛型擦除没有关系 , TreeMap没有参数为Class的构造方法 , 你传入Class当然会报错
FLASH操作问题，STM32G071的flash擦除 stm32 单片机
2021-08-16 16:51

回答 1 已采纳已经解决 page多减了1
重新组织php数组以擦除双重条目并合并其他条目 json php
2013-12-09 14:35

回答 2 已采纳 From what I gather, the following loop is what you wish to do. It will create a new array with th
什么是php 的精华,PHP精华
2021-04-13 15:02

乐昂的博客 PHP性能优化1.使用静态方法2避免函数调用time() – $_SERVER[REQUEST_TIME];phpversion() – PHP_VERSIONget_class – __CLASS__is_null – NULL ===print() – echo3.使用include代替include_once，include_once...
java 泛型擦除发生在哪个阶段，如何用反编译工具查看泛型擦除后的代码？ java
2015-12-28 09:36

回答 2 已采纳用javap的-s参数。题主给的javap输出是 javap -c 得到的。用 javap -c -s 就会看到多了一些Signature行，它们其中一些是擦除后的signature。例如这里： p
C++ 的什么是 Java 不能取代的？
2021-01-30 21:50

徐福鑫的博客十几年前，大学的时候，我们的入门编程语言就是c++。我至今还记得两个很搞笑的事情。1》大一开始上c++的课的时候，我有个同学，外号黄牛，拿着一本《21...那个kim现在在迅雷，写php去了，薪水老高了，他现在的口号是：
用php照片艺术化,不满足简单修图？这些应用能让照片充满艺术感
2021-04-28 04:06

editage意得辑的博客小黑在之前已经为大家介绍过不少能够进行照片后期的应用了。不过，这些应用或是偏向专业化的修图，或是以人像美颜为主的修图工具。而小黑今天要为大家介绍的，则是一些能把照片做出花儿来的修图工具，相信有了它们的...
ps和php,PS是什么
2021-04-12 20:29

weixin_39639919的博客 PS主要处理由像素所构成的数字图像，通过使用多种绘图工具来对图片进行编辑来创建出更加美观的效果对于PS大家都很熟悉，其实它全称为Adobe Photoshop，这是一个非常强大的设计软件，它主要，包括了大量的工具，但是...
ps写php,PS是什么
2021-04-19 06:38

weixin_39867708的博客 PS主要处理由像素所构成的数字图像，通过使用多种绘图工具来对图片进行编辑来创建出更加美观的效果对于PS大家都很熟悉，其实它全称为Adobe Photoshop，这是一个非常强大的设计软件，它主要，包括了大量的工具，但是...
没有解决我的问题, 去提问

悬赏问题

¥15 求帮我调试一下freefem代码
¥15 R语言Rstudio突然无法启动
¥15 关于#matlab#的问题：提取2个图像的变量作为另外一个图像像元的移动量，计算新的位置创建新的图像并提取第二个图像的变量到新的图像
¥15 改算法，照着压缩包里边，参考其他代码封装的格式写到main函数里
¥15 用windows做服务的同志有吗
¥60 求一个简单的网页(标签-安全|关键词-上传)
¥35 lstm时间序列共享单车预测，loss值优化，参数优化算法
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图

码龄粉丝数原力等级 --

为什么有些网站不可擦除？

3条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

为什么有些网站不可擦除？

3条回答 默认 最新

悬赏问题

3条回答默认最新