关闭
dongyan3616 2017-08-23 02:20
浏览 69
已采纳

无法通过PHP解析页面中的链接(href)

Please see my script below :

<?php

    function getContent ()
    {
        $ch = curl_init();  
        curl_setopt($ch,CURLOPT_URL, 'http://localhost/test.php/test2.php');
        curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
        $output=curl_exec($ch);
        curl_close($ch);
        return $output;

    }

    function getHrefFromLinks ($cString){

        libxml_use_internal_errors(true);

        $dom = new DomDocument();
        $dom->loadHTML($cString);

        $xpath = new DOMXPath($dom);
        $nodes = $xpath->query('//a/@href');
        foreach($nodes as $href) {

            echo $href->nodeValue;   echo "<br />";                    // echo current attribute value
            $href->nodeValue = 'new value';              // set new attribute value
            $href->parentNode->removeAttribute('href');  // remove attribute
        }

        foreach (libxml_get_errors() as $error) {

        }

        libxml_clear_errors();

    }



echo getHrefFromLinks (getContent());

?>

The output of http://localhost/test.php/test2.php is :

<a href='/oncelink/index.html'><span class="lsbold">Luck</span> Lucky</a><a href='/oncelink-2/lucky'locki'><span class="lsbold">Luck</span>'s Locki</a>

When echo getHrefFromLinks (getContent()); runs, the output is :

/oncelink/index.html<br />/oncelink-2/lucky<br />

This is wrong, as the output should be :

/oncelink/index.html<br />/oncelink-2/lucky'locki<br />

I understand that the href value generated from the link is somehow incorrect as it includes an additional apostrophe but I won't be able to change that as it is pre-generated.

The other question is, how can I get the value of the span tag :

<span class="lsbold">

Thanks in advance!

展开全部

  • 写回答

1条回答 默认 最新

  • doulv1760 2017-08-23 03:18
    关注

    SOLVED :)

    Well. If it's stupid but it works, then it aint stupid :D

    Just added the following code in the end :

    $fix = str_replace("href='", 'href="', getContent());
    $fix = str_replace("'>", '">', $fix);
    echo getHrefFromLinks ($fix);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部