dongyou2279 2014-03-11 16:19
浏览 59
已采纳

如何使用php通过classname或id获取innerhtml

Hi i am loading content from external url. something like this.

$html=get_data($external_url);

where get_data() is a function for getting content using curl.

now after this , i want to get the inner html from different html elements like h1,div,p,span by using their class or id.

for example if the content from external url($html) is something like this.

<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the content.
    </div>
</body>

now i want to get the inner html of a html tag with class="title". similarly i want to get inner html of a tag with id="content"

How to do this using php? i have no knowledge about DOM, XML. please help.

  • 写回答

2条回答 默认 最新

  • doulingqiu4349 2014-03-11 17:36
    关注

    Here is a function DOMDocument::saveHTML(). In the current php versions, this can take a node you want to save as html. To save the inner html of a node, you have to save each child node.

    function getHtml($nodes) {
      $result = '';
      foreach ($nodes as $node) {
        $result .= $node->ownerDocument->saveHtml($node);
      }
      return $result;
    }
    

    To fetch the nodes, you can use Xpath. The id is easy.

    Fetch all element nodes:

    //*

    that have the id attribute "content"

    //*[@id="content"]

    Use only the first found node, in case somebody added the same id multiple times.

    //*[@id="content"][1]

    Get the child nodes - node() includes element, text and several other nodes

    //*[@id="content"][1]/node()

    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));
    

    The class attribute is a little more complex. Class attributes are token lists, they can contain several class names. Here is a trick to matching them. The Xpath function normalize-space() converts all groups of whitespaces into single space separators. Add a space in front and to the end and you get a string like " one two three ". Now you can check if " one " is a part of that string. In Xpath:

    Normalize the class attribute:

    normalize-space(@class)

    Add spaces to start and end:

    concat(" ", normalize-space(@class), " ")

    Check if it contains the substring

    contains(concat(" ", normalize-space(@class), " "), " title ")

    Use it to limit the nodes

    //*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()

    Put together:

    $html = <<<'HTML'
    <html>
    <title></title>
    <body>
        <h1 class="title">I am title</h1>
        <div id="content">
            i am the <b>content</b>.
        </div>
    </body>
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    function getHtml($nodes) {
      $result = '';
      foreach ($nodes as $node) {
        $result .= $node->ownerDocument->saveHtml($node);
      }
      return $result;
    }
    
    // first node with the id
    var_dump(
      getHtml(
        $xpath->evaluate('//*[@id="content"][1]/node()')
      )
    );
    
    // first node with the class
    var_dump(
      getHtml(
        $xpath->evaluate(
          '//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
        )
      )
    );
    
    // alternative - handling multiple nodes with the same class in a loop
    $nodes = $xpath->evaluate(
      '//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
    );
    foreach ($nodes as $node) {
      var_dump(getHtml($xpath->evaluate('node()', $node)));
    }
    

    Output: https://eval.in/118248

    string(40) "
            i am the <b>content</b>.
        "
    string(10) "I am title"
    string(10) "I am title"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行