PHP file_get_contents只返回换行符

I have just a PHP script for HTML parsing and it works on simple web sites, but now I need to parse the cinema program from this website. I am using the file_get_contents function, which returns just 4 new line delimiters and I just can't figure out why. The website itself will be more difficult to parse with DOMDocument a XPath because the program itself is just pop-up window and it doesn't seem to change the URL address but I will try to handle this problem after retrieving the HTML code of the site.

Here is the shortened version of my script:

<?php
      $url = "http://www.cinemacity.cz/";
      $content = file_get_contents($url);
      $dom = new DomDocument;
      $dom->loadHTML($content);

      if ($dom == FALSE) {
        echo "FAAAAIL
";
      }

      $xpath = new DOMXPath($dom);

      $tags = $xpath->query("/html");

      foreach ($tags as $tag) {
        var_dump(trim($tag->nodeValue));
      }
?>

EDIT:

So, following the advice by WBAR (thank you), I was looking for a way how to change the header in file_get_contents() function a this is the answer I've found elsewhere. Now I am able to obtain the HTML of the site, hopefully I will manage parsing of this mess :D

<?php
    libxml_use_internal_errors(true);
    // Create a stream
    $opts = array(
      'http'=>array(
        'user_agent' => 'PHP libxml agent', //Wget 1.13.4
        'method'=>"GET",
        'header'=>"Accept-language: en
" .
                  "Cookie: foo=bar
"
      )
    );
    $context = stream_context_create($opts);

    // Open the file using the HTTP headers set above
    $content = file_get_contents('http://www.cinemacity.cz/', false, $context);

    $dom = new DomDocument;
    $dom->loadHTML($content);

    if ($dom == FALSE) {
        echo "FAAAAIL
";
    }

    $xpath = new DOMXPath($dom);

    $tags = $xpath->query("/html");

    foreach ($tags as $tag) {
        var_dump(trim($tag->nodeValue));
    }
?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxuan0698 2012-10-07 11:57
关注
The problem is not in PHP but in target host. It detects client's User-Aget header. Look at this:

wget http://www.cinemacity.cz/ 2012-10-07 13:54:39 (1,44 MB/s) - saved `index.html.1' [234908]

but when remove UserAget headers:

wget --user-agent="" http://www.cinemacity.cz/ 2012-10-07 13:55:41 (262 KB/s) - saved `index.html.2' [4/4]

Only 4 bytes were returned by server
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

有没有办法用php file_get_contents绕过403错误？ php
2017-12-02 19:36

回答 2 已采纳 You need to add the User-Agent header to the actual header: $context = stream_context_create(
如何在php中执行file_get_contents后清除内存 php
2015-03-20 20:26

回答 3 已采纳 if ($_POST["submit"]) { $ip = $_POST['ip']; $subnet = $_POST['subnet'];
如何在file_get_contents（）中处理403错误？ php
2017-01-23 08:40

回答 2 已采纳 Personally I'm suggesting you to use cURL instead of file_get_contents. file_get_contents is great
解决file_get_contents无法请求https连接的方法
2020-12-19 11:35

在PHP编程中，`file_get_contents` 是一个非常实用的函数，它允许程序读取URL内容，包括HTTP和HTTPS协议的资源。然而，当尝试通过`file_get_contents` 访问HTTPS连接时，可能会遇到“Unable to find the wrapper ...
如何将file_get_contents转换为cURL php
2018-03-01 16:02

回答 1 已采纳 You may still get a 401 with curl but you can try the following $ch = curl_init(); curl_setopt($c
PHP file_get_contents使用变量 javascript jquery php
2012-08-14 16:01

回答 1 已采纳 Use output buffering and require: $org_ID = 5; $member_ID = 10; ob_start(); require '/path/to/jav
file_get_contents无效 - 连接被拒绝 php
2017-01-01 20:02

回答 1 已采纳 Isn't Hostgator blocking the requests because of the DDoS protection? Give them a call, my hosting
PHP中file_put_contents追加和换行的实现方法
2021-01-20 01:31

也可以简单的使用file_get_contents()和file_put_contents(). file_put_contents()写文件。默认的是重新写文件，也就是会替换原先的内容。追加的话使用参数FILE_APPEND. 以追加形式写入内容当设置 flags ...
PHP使用file_get_contents（）检查外部服务器上是否存在文件 php
2014-08-18 01:29

回答 3 已采纳 I think best method for me is using this script: $file = "http://website.com/dir/filename.php"; $
将函数添加到PHP file_get_contents路径 php
2015-11-05 20:07

回答 1 已采纳 Dependent on whether the function root() returns the path without trailing slath: <?php echo
php file_get_contents（）转换html实体，如＆ouml; 到ö html php
2017-11-16 10:27

回答 3 已采纳 htmlspecialchars will change: < ö to: < ö and it's ok. It will display
php file_get_contents与file_put_contents
2018-06-03 16:39

酱紫人的理直气壮的博客我们队file_get_contents的定义是：file_get_contents...而file()是将文件作为一个数组返回，数组中的每个单元都是文件中相应的一行，包括换行符在内。如果函数将文件返回失败，则返回false;file(path,include_path,...
php中_file_用法,PHP中file()函数和file_get_contents() 函数的用法和区别
2021-04-20 11:53

罗让的博客在PHP中，要读取一个文件的内容时，经常使用file()和file_get_contents()，...数组中的每个单元都是文件中相应的一行，包括换行符在内。如果失败，则返回false。file_get_contents() 函数是把整个文件读入一个字符...
php file_put_contents() 读取数据不换行问题
2016-04-30 14:00

xkjscm的博客 PHP 文件操作时, file_put_contents() 和 file_get_contents() 的效率要高于 fwrite() 和 fread(). file_put_contents() 和 file_get_contents() 是PHP直接在底层为我们实现的文件读写方法: 例如, 读取 D:\...
关于微信等app请求服务器,file_get_contents()函数和CURL用法
2017-02-16 11:49

Houzhyan的博客 //php5.6不建议使用$GLOBALS[]来接收POST数据,推荐改用 file_get_contents("php://input"); $postStr = $GLOBALS["HTTP_RAW_POST_DATA"]; 客户端使用ajax技术中的post方法向服务器发送的所有内容都可以在...
没有解决我的问题, 去提问

悬赏问题

¥15 做个有关计算的小程序
¥15 MPI读取tif文件无法正常给各进程分配路径
¥15 如何用MATLAB实现以下三个公式（有相互嵌套）
¥30 关于#算法#的问题：运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题求各位帮我解答一下
¥15 setInterval 页面闪烁，怎么解决
¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化

PHP file_get_contents只返回换行符

2条回答 默认 最新

悬赏问题

2条回答默认最新