doutenggu4070 2019-06-16 10:49
浏览 368
已采纳

将HTML转换为PHP数组

I have a string containing also HTML in a $html variable:

'Here is some <a href="#">text</a> which I do not need to extract but then there are 
<figure class="class-one">
    <img src="/example.jpg" alt="example alt" class="some-image-class">
    <figcaption>example caption</figcaption>
</figure>

And another one (and many more)
<figure class="class-one some-other-class">
    <img src="/example2.jpg" alt="example2 alt">
</figure>'

I want to extract all <figure> elements and everything they contain including their attributes and other html-elements and put this in an array in PHP so I would get something like:

    $figures = [
        0 => [
            "class" => "class-one",
            "img" => [
                "src" => "/example.jpg",
                "alt" => "example alt",
                "class" => "some-image-class"
            ],
            "figcaption" => "example caption"
        ],
        1 => [
            "class" => "class-one some-other-class",
            "img" => [
                "src" => "/example2.jpg",
                "alt" => "example2 alt",
                "class" => null
            ],
            "figcaption" => null
        ]];

So far I have tried:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$figures = array();
foreach ($figures as $figure) {
    $figures['class'] = $figure->getAttribute('class');
    // here I tried to create the whole array but I can't seem to get the values from the HTML 
    // also I'm not sure how to get all html-elements within <figure>   
} 

Here is a Demo.

  • 写回答

2条回答 默认 最新

  • donpb2823 2019-06-16 13:08
    关注

    Here is the code that should get you where you want to be. I have added comments where I felt they would be helpful:

    <?php
    
    $htmlString = 'Here is some <a href="#">text</a> which I do not need to extract but then there are <figure class="class-one"><img src="/example.jpg" alt="example alt" class="some-image-class"><figcaption>example caption</figcaption></figure>And another one (and many more)<figure class="class-one some-other-class"><img src="/example2.jpg" alt="example2 alt"></figure>';
    
    //Create a new DOM document
    $dom = new DOMDocument;
    
    //Parse the HTML.
    @$dom->loadHTML($htmlString);
    
    //Create new XP
    $xp = new DOMXpath($dom);
    
    //Create empty figures array that will hold all of our parsed HTML data
    $figures = array();
    
    //Get all <figure> elements
    $figureElements = $xp->query('//figure');
    
    //Create number variable to keep track of our $figures array index
    $figureCount = 0;
    
    //Loop through each <figure> element
    foreach ($figureElements as $figureElement) {
        $figures[$figureCount]["class"] = trim($figureElement->getAttribute('class'));
        $figures[$figureCount]["img"]["src"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('src');
        $figures[$figureCount]["img"]["alt"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('alt');
    
        //Check that an img class exists, otherwise set the value to null. If we don't do this PHP will throw a NOTICE.
        if (boolval($xp->evaluate('//img', $figureElement)->item($figureCount))) {
            $figures[$figureCount]["img"]["class"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('class');
        } else {
            $figures[$figureCount]["img"]["class"] = null;
        }
    
        //Check that a <figcaption> element exists, otherwise set the value to null
        if (boolval($xp->evaluate('//figcaption', $figureElement)->item($figureCount))) {
            $figures[$figureCount]["figcaption"] = $xp->query('//figcaption', $figureElement)->item($figureCount)->nodeValue;
        } else {
            $figures[$figureCount]["figcaption"] = null;
        }
    
        //Increment our $figureCount so that we know we can create a new array index.
        $figureCount++;
    }
    
    print_r($figures);
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制