duansha8115 2018-07-12 23:06

已采纳

CURLOPT_RETURNTRANSFER以字符串形式返回HTML

I'm trying to parse HTML using CURL DOMDocument or Xpath, but the CURLOPT_RETURNTRANSFER always returns the url's HTML in string which makes it invalid HTML to be parsed

Returned output:

string(102736) "<!DOCTYPE html>


    <html itemscope itemtype="http://schema.org/QAPage" class="html__responsive">

    <head>

        <title>html - PHP outputting text WITHOUT echo/print? - Stack Overflow</title>
        <link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d">
        <link rel="apple-touch-icon image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">
        <link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">
        <meta name="viewport" content="width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0">"

PHP snipe see the output

$cc = $http->get($url);
var_dump($cc);

CURL library used: https://github.com/seikan/HTTP/blob/master/class.HTTP.php

When I remove CURLOPT_RETURNTRANSFER I see the HTML without the string(102736), but it echo the url even if i didn't request (reference: curl_exec printing results when I don't want to)

Here is the PHP snipe I used to parse html:

  $cc = $http->get($url);
  $doc = new \DOMDocument();
  $doc->loadHTML($cc);

  // all links in document
  $links = [];
  $arr = $doc->getElementsByTagName("a"); // DOMNodeList Object
  foreach($arr as $item) { // DOMElement Object
    $href =  $item->getAttribute("href");
    $text = trim(preg_replace("/[
]+/", " ", $item->nodeValue));
    $links[] = [
      'href' => $href,
      'text' => $text
    ];
  }

Any idea?

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

duananyantan04633 2018-07-12 23:13

关注

Check the return value -

print_r($cc);

you will probably find that the output is an array (if the code ran successfully). From the library source, the return of get() is...

return [
    'header' => $headers,
    'body'   => substr($response, $size),
];

So you will need to change the load line to be...

$doc->loadHTML($cc['body']);

Update:

as an example of the above and using this question as the page to work with...

$cc = $http->get("https://stackoverflow.com/questions/51319473/curlopt-returntransfer-returns-html-in-string/51319585?noredirect=1#comment89619183_51319585");
$doc = new \DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($cc['body']);

// all links in document
$links = [];
$arr = $doc->getElementsByTagName("a"); // DOMNodeList Object
foreach($arr as $item) { // DOMElement Object
    $href =  $item->getAttribute("href");
    $text = trim(preg_replace("/[
]+/", " ", $item->nodeValue));
    $links[] = [
        'href' => $href,
        'text' => $text
    ];
}

print_r($links);

Outputs...

Array
(
    [0] => Array
        (
            [href] => #
            [text] => 
        )

    [1] => Array
        (
            [href] => https://stackoverflow.com
            [text] => Stack Overflow
        )

    [2] => Array
        (
            [href] => #
            [text] => 
        )

    [3] => Array
        (
            [href] => https://stackexchange.com/users/?tab=inbox
...

展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

编辑

预览

报告相同问题？

关注问题

有没有用于PHP CURLOPT_RETURNTRANSFER的C API？ php
2011-07-17 14:49

回答 2 已采纳 There's no CURLOPT_RETURNTRANSFER in libcurl C API. You can do that with a callback function, it's
CURLOPT_VERBOSE不起作用 php windows
2018-04-04 11:13

回答 1 已采纳 tl;dr CURLOPT_STDERR must be set to something specific. Setting curl_setopt($c, CURLOPT_STDERR,
curl_setopt使用CURLOPT_COOKIESESSION，CURLOPT_COOKIEJAR，CURLOPT_COOKIEFILE返回false php
2013-11-28 00:43

回答 1 已采纳 I don't know why, but Curl was compiled without cookies support. Just reinstalled it with --enable
大学生8开发中curl_setopt中的CURLOPT_WRITEFUNCTION使用回调和闭包
2024-10-02 12:04

asxxf的博客在PHP中,curl_setopt函数的 CURLOPT_WRITEFUNCTION 是 PHP 中的 cURL 库的一个选项,允许你指定一个回调函数,这个回调函数会处理从服务器接收到的数据,这个回调函数应该有两个参数,第一个是接收到的数据,第二个是写入...
未定义的常量CURLOPT_GET - 假设为'CURLOPT_GET' php
2016-09-14 01:15

回答 4 已采纳 Try using CURLOPT_HTTPGET though I am not sure if it serves your purpose. More detail can be foun
设置空CURLOPT_POSTFIELDS php
2019-08-16 19:37

回答 1 已采纳 $options = array( ... ); if(!empty($postData)){ $options[CURLOPT_POSTFIELDS]=$postData; }
PHP - CURLOPT_BUFFERSIZE被忽略 php
2015-06-25 09:40

回答 1 已采纳 As Barman stated, CURLOPT_BUFFERSIZE is related to download and won't work for upload. The soluti
php格式化curl返回的json字符串,格式化curl返回数据
2021-04-12 13:33

weixin_39838758的博客你好我正在从我的curl函数中得到一个非常长的字符串，我想要格式化成一个数组。到目前为止，我已经把它变成了一个字符串数组，但它不完全是我想要的。我希望它被格式化为一组键和值。格式化curl返回数据我的...
php curl CURLOPT_PASSWDFUNCTION选项 php
2015-05-18 21:12

回答 2 已采纳 Seems that this functionality has been removed: CURLOPT_PASSWDFUNCTION Introduced in 7.4.2
CURLOPT_HEADER返回401 php
2014-09-25 07:24

回答 1 已采纳 HTTP response headers are included to the result of curl_exec when you set CURLOPT_HEADER option.
PHP JSON响应以字符串值开头 json php
2019-02-22 02:41

回答 1 已采纳 use echo to print response <?php ...[VARIABLES]... $ch = curl_init($url); cur
PHP cURL请求详解
2018-06-29 07:00

迷你芊宝宝的博客往往还需要请求其他服务器接口的数据，我们一般有3种方式来获取数据，分别是：file_get_contentsfsockopencurl3种常用的接口获取方式简述file_get_contents函数声明：/** 函数作用：将整个文件读入字符串 ...
CURL请求
2022-01-26 09:25

Maybe I Simple的博客 CURLOPT_RETURNTRANSFER 将curl_exec()获取的信息以文件流的形式返回，而不是直接输出 CURLOPT_POSTFIELDS 全部数据使用HTTP协议中的"POST"操作来发送。要发送文件，在文件名前面加上@前缀并使用完整路径。这个...
php curl_setopt抓取内容,PHP的CURL方法curl_setopt()函数案例介绍(抓取网页,POST数据)
2021-05-07 22:34

三木三土的博客如果为空字符串""，请求头会发送所有支持的编码类型。 curl_setopt($ci, CURLOPT_SSL_VERIFYPEER, $this->ssl_verifypeer);//禁用后cURL将终止从服务端进行验证 curl_setopt($ci, CURLOPT_HEADERFUNCTION, array...
懒人——微信app支付类,php生成字符串并返回前端
2020-06-23 02:11

极梦网络无忧的博客 //要求结果为字符串且输出到屏幕上 curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); //post提交方式 curl_setopt($ch, CURLOPT_POST, TRUE); curl_setopt($ch, CURLOPT_POSTFIELDS, $xml); //运行curl $data = ...
php curl 请求头、响应头
2020-12-24 12:37

lxw1844912514的博客 <?...// curl 获取请求头 $ch = curl_init(); curl_setopt($ch, ... // TRUE 将curl_exec()获取的信息以字符串返回，而不是直接输出 curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE); // 设置 CURLINFO_HEADER_OUT.
php curl 发起get和post网络请求、发送文件等
2020-04-20 05:13

xiao助阵的博客 curl介绍https://www.cnblogs.com/niuben/p/11558420.html curl是一个开源的网络链接库，支持http, https, ftp, gopher, telnet, dict, file, and ldap 协议。之前均益介绍了python版本的pycurlhttp://junyiseo....
APISpace_空气质量查询_API接口_PHP调用示例代码.docx
2022-05-26 09:46

CURLOPT_RETURNTRANSFER => true, // 返回字符串而不是输出 CURLOPT_ENCODING => "", // 编码方式 CURLOPT_MAXREDIRS => 10, // 最大重定向次数 CURLOPT_TIMEOUT => 30, // 超时时间 CURLOPT_...
APISpace_万年历_API接口_PHP调用示例代码.docx
2022-05-26 09:51

CURLOPT_RETURNTRANSFER => true, // 将响应结果作为字符串返回 CURLOPT_ENCODING => "", // 不使用编码 CURLOPT_MAXREDIRS => 10, // 最大重定向次数 CURLOPT_TIMEOUT => 30, // 超时时间（秒） CURLOPT_...
APISpace_全国天气预报查询_API接口_PHP调用示例代码.docx
2022-05-26 09:48

CURLOPT_RETURNTRANSFER => true, // 将cURL结果作为字符串返回 CURLOPT_ENCODING => "", // 不使用压缩 CURLOPT_MAXREDIRS => 10, // 最大重定向次数 CURLOPT_TIMEOUT => 30, // 超时时间（秒） CURLOPT_...
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

CURLOPT_RETURNTRANSFER以字符串形式返回HTML

1条回答默认最新

码龄粉丝数原力等级 --

CURLOPT_RETURNTRANSFER以字符串形式返回HTML

1条回答 默认 最新

1条回答默认最新