dongyou2279 2014-03-11 16:19
浏览 59
已采纳

如何使用php通过classname或id获取innerhtml

Hi i am loading content from external url. something like this.

$html=get_data($external_url);

where get_data() is a function for getting content using curl.

now after this , i want to get the inner html from different html elements like h1,div,p,span by using their class or id.

for example if the content from external url($html) is something like this.

<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the content.
    </div>
</body>

now i want to get the inner html of a html tag with class="title". similarly i want to get inner html of a tag with id="content"

How to do this using php? i have no knowledge about DOM, XML. please help.

  • 写回答

2条回答 默认 最新

  • doulingqiu4349 2014-03-11 17:36
    关注

    Here is a function DOMDocument::saveHTML(). In the current php versions, this can take a node you want to save as html. To save the inner html of a node, you have to save each child node.

    function getHtml($nodes) {
      $result = '';
      foreach ($nodes as $node) {
        $result .= $node->ownerDocument->saveHtml($node);
      }
      return $result;
    }
    

    To fetch the nodes, you can use Xpath. The id is easy.

    Fetch all element nodes:

    //*

    that have the id attribute "content"

    //*[@id="content"]

    Use only the first found node, in case somebody added the same id multiple times.

    //*[@id="content"][1]

    Get the child nodes - node() includes element, text and several other nodes

    //*[@id="content"][1]/node()

    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));
    

    The class attribute is a little more complex. Class attributes are token lists, they can contain several class names. Here is a trick to matching them. The Xpath function normalize-space() converts all groups of whitespaces into single space separators. Add a space in front and to the end and you get a string like " one two three ". Now you can check if " one " is a part of that string. In Xpath:

    Normalize the class attribute:

    normalize-space(@class)

    Add spaces to start and end:

    concat(" ", normalize-space(@class), " ")

    Check if it contains the substring

    contains(concat(" ", normalize-space(@class), " "), " title ")

    Use it to limit the nodes

    //*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()

    Put together:

    $html = <<<'HTML'
    <html>
    <title></title>
    <body>
        <h1 class="title">I am title</h1>
        <div id="content">
            i am the <b>content</b>.
        </div>
    </body>
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    function getHtml($nodes) {
      $result = '';
      foreach ($nodes as $node) {
        $result .= $node->ownerDocument->saveHtml($node);
      }
      return $result;
    }
    
    // first node with the id
    var_dump(
      getHtml(
        $xpath->evaluate('//*[@id="content"][1]/node()')
      )
    );
    
    // first node with the class
    var_dump(
      getHtml(
        $xpath->evaluate(
          '//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
        )
      )
    );
    
    // alternative - handling multiple nodes with the same class in a loop
    $nodes = $xpath->evaluate(
      '//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
    );
    foreach ($nodes as $node) {
      var_dump(getHtml($xpath->evaluate('node()', $node)));
    }
    

    Output: https://eval.in/118248

    string(40) "
            i am the <b>content</b>.
        "
    string(10) "I am title"
    string(10) "I am title"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab