PHP file_get_contents只返回换行符

I have just a PHP script for HTML parsing and it works on simple web sites, but now I need to parse the cinema program from this website. I am using the file_get_contents function, which returns just 4 new line delimiters and I just can't figure out why. The website itself will be more difficult to parse with DOMDocument a XPath because the program itself is just pop-up window and it doesn't seem to change the URL address but I will try to handle this problem after retrieving the HTML code of the site.

Here is the shortened version of my script:

<?php
      $url = "http://www.cinemacity.cz/";
      $content = file_get_contents($url);
      $dom = new DomDocument;
      $dom->loadHTML($content);

      if ($dom == FALSE) {
        echo "FAAAAIL
";
      }

      $xpath = new DOMXPath($dom);

      $tags = $xpath->query("/html");

      foreach ($tags as $tag) {
        var_dump(trim($tag->nodeValue));
      }
?>

EDIT:

So, following the advice by WBAR (thank you), I was looking for a way how to change the header in file_get_contents() function a this is the answer I've found elsewhere. Now I am able to obtain the HTML of the site, hopefully I will manage parsing of this mess :D

<?php
    libxml_use_internal_errors(true);
    // Create a stream
    $opts = array(
      'http'=>array(
        'user_agent' => 'PHP libxml agent', //Wget 1.13.4
        'method'=>"GET",
        'header'=>"Accept-language: en
" .
                  "Cookie: foo=bar
"
      )
    );
    $context = stream_context_create($opts);

    // Open the file using the HTTP headers set above
    $content = file_get_contents('http://www.cinemacity.cz/', false, $context);

    $dom = new DomDocument;
    $dom->loadHTML($content);

    if ($dom == FALSE) {
        echo "FAAAAIL
";
    }

    $xpath = new DOMXPath($dom);

    $tags = $xpath->query("/html");

    foreach ($tags as $tag) {
        var_dump(trim($tag->nodeValue));
    }
?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxuan0698 2012-10-07 11:57
关注
The problem is not in PHP but in target host. It detects client's User-Aget header. Look at this:

wget http://www.cinemacity.cz/ 2012-10-07 13:54:39 (1,44 MB/s) - saved `index.html.1' [234908]

but when remove UserAget headers:

wget --user-agent="" http://www.cinemacity.cz/ 2012-10-07 13:55:41 (262 KB/s) - saved `index.html.2' [4/4]

Only 4 bytes were returned by server
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

PHP file_get_contents只返回换行符 php
2012-10-07 11:46

回答 2 已采纳 The problem is not in PHP but in target host. It detects client's User-Aget header. Look at this:
如何在php中执行file_get_contents后清除内存 php
2015-03-20 20:26

回答 3 已采纳 if ($_POST["submit"]) { $ip = $_POST['ip']; $subnet = $_POST['subnet'];
使用file_get_contents创建php缓存 php
2015-06-26 18:57

回答 2 已采纳 This line file_get_contents('includes/menu.php'); will just read the php file, without executin
PHP中file_put_contents追加和换行的实现方法
2021-01-20 01:31

也可以简单的使用file_get_contents()和file_put_contents(). file_put_contents()写文件。默认的是重新写文件，也就是会替换原先的内容。追加的话使用参数FILE_APPEND. 以追加形式写入内容当设置 flags ...
如何将file_get_contents转换为cURL php
2018-03-01 16:02

回答 1 已采纳 You may still get a 401 with curl but you can try the following $ch = curl_init(); curl_setopt($c
PHP file_get_contents使用变量 javascript jquery php
2012-08-14 16:01

回答 1 已采纳 Use output buffering and require: $org_ID = 5; $member_ID = 10; ob_start(); require '/path/to/jav
有没有办法用php file_get_contents绕过403错误？ php
2017-12-02 19:36

回答 2 已采纳 You need to add the User-Agent header to the actual header: $context = stream_context_create(
php file_get_contents与file_put_contents
2018-06-03 16:39

酱紫人的理直气壮的博客我们队file_get_contents的定义是：file_get_contents...而file()是将文件作为一个数组返回，数组中的每个单元都是文件中相应的一行，包括换行符在内。如果函数将文件返回失败，则返回false;file(path,include_path,...
Go lang中的PHP file_get_contents [重复] php
2014-08-02 05:16

回答 1 已采纳 I don't think there is one unique golang function as versatile as file-get-contents.php. For rea
PHP使用file_get_contents（）检查外部服务器上是否存在文件 php
2014-08-18 01:29

回答 3 已采纳 I think best method for me is using this script: $file = "http://website.com/dir/filename.php"; $
php file_get_contents（）转换html实体，如＆ouml; 到ö html php
2017-11-16 10:27

回答 3 已采纳 htmlspecialchars will change: < ö to: < ö and it's ok. It will display
php中_file_用法,PHP中file()函数和file_get_contents() 函数的用法和区别
2021-04-20 11:53

罗让的博客在PHP中，要读取一个文件的内容时，经常使用file()和file_get_contents()，...数组中的每个单元都是文件中相应的一行，包括换行符在内。如果失败，则返回false。file_get_contents() 函数是把整个文件读入一个字符...
file_get_contents返回400 Bad Request php
2015-05-09 11:20

回答 2 已采纳 Seems small mistake remove space from street address echo file_get_contents('https://maps.googlea
php file_put_contents() 读取数据不换行问题
2016-04-30 14:00

xkjscm的博客 PHP 文件操作时, file_put_contents() 和 file_get_contents() 的效率要高于 fwrite() 和 fread(). file_put_contents() 和 file_get_contents() 是PHP直接在底层为我们实现的文件读写方法: 例如, 读取 D:\...
php fread 逐行读取文件,PHP读取文件fread()、file_get_contents()、fgets()、file
2021-04-28 10:25

weixin_39614146的博客在PHP提供了多个从文件中读取内容的标准函数，可以根据它们的功能特性在程序中选择哪个函数使用，在读取文件时，不仅要注意行结束符号\n，程序也需要一种标准的方式来识别何时到达文件的末尾，这个标准通常称为EOF...
没有解决我的问题, 去提问

悬赏问题

¥15 抖音咸鱼付款链接转码支付宝
¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
¥15 求螺旋焊缝的图像处理
¥15 blast算法（相关搜索：数据库）
¥15 请问有人会紧聚焦相关的matlab知识嘛？
¥15 网络通信安全解决方案
¥50 yalmip+Gurobi
¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
¥15 itunes恢复数据最后一步发生错误
¥15 关于#windows#的问题：2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了

PHP file_get_contents只返回换行符

2条回答 默认 最新

悬赏问题

2条回答默认最新