dongyan3616
2017-08-23 10:20
浏览 65
已采纳

无法通过PHP解析页面中的链接(href)

Please see my script below :

<?php

    function getContent ()
    {
        $ch = curl_init();  
        curl_setopt($ch,CURLOPT_URL, 'http://localhost/test.php/test2.php');
        curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
        $output=curl_exec($ch);
        curl_close($ch);
        return $output;

    }

    function getHrefFromLinks ($cString){

        libxml_use_internal_errors(true);

        $dom = new DomDocument();
        $dom->loadHTML($cString);

        $xpath = new DOMXPath($dom);
        $nodes = $xpath->query('//a/@href');
        foreach($nodes as $href) {

            echo $href->nodeValue;   echo "<br />";                    // echo current attribute value
            $href->nodeValue = 'new value';              // set new attribute value
            $href->parentNode->removeAttribute('href');  // remove attribute
        }

        foreach (libxml_get_errors() as $error) {

        }

        libxml_clear_errors();

    }



echo getHrefFromLinks (getContent());

?>

The output of http://localhost/test.php/test2.php is :

<a href='/oncelink/index.html'><span class="lsbold">Luck</span> Lucky</a><a href='/oncelink-2/lucky'locki'><span class="lsbold">Luck</span>'s Locki</a>

When echo getHrefFromLinks (getContent()); runs, the output is :

/oncelink/index.html<br />/oncelink-2/lucky<br />

This is wrong, as the output should be :

/oncelink/index.html<br />/oncelink-2/lucky'locki<br />

I understand that the href value generated from the link is somehow incorrect as it includes an additional apostrophe but I won't be able to change that as it is pre-generated.

The other question is, how can I get the value of the span tag :

<span class="lsbold">

Thanks in advance!

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doulv1760 2017-08-23 11:18
    已采纳

    SOLVED :)

    Well. If it's stupid but it works, then it aint stupid :D

    Just added the following code in the end :

    $fix = str_replace("href='", 'href="', getContent());
    $fix = str_replace("'>", '">', $fix);
    echo getHrefFromLinks ($fix);
    
    点赞 评论

相关推荐 更多相似问题