doutenggu4070 2019-06-16 10:49
浏览 368
已采纳

将HTML转换为PHP数组

I have a string containing also HTML in a $html variable:

'Here is some <a href="#">text</a> which I do not need to extract but then there are 
<figure class="class-one">
    <img src="/example.jpg" alt="example alt" class="some-image-class">
    <figcaption>example caption</figcaption>
</figure>

And another one (and many more)
<figure class="class-one some-other-class">
    <img src="/example2.jpg" alt="example2 alt">
</figure>'

I want to extract all <figure> elements and everything they contain including their attributes and other html-elements and put this in an array in PHP so I would get something like:

    $figures = [
        0 => [
            "class" => "class-one",
            "img" => [
                "src" => "/example.jpg",
                "alt" => "example alt",
                "class" => "some-image-class"
            ],
            "figcaption" => "example caption"
        ],
        1 => [
            "class" => "class-one some-other-class",
            "img" => [
                "src" => "/example2.jpg",
                "alt" => "example2 alt",
                "class" => null
            ],
            "figcaption" => null
        ]];

So far I have tried:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$figures = array();
foreach ($figures as $figure) {
    $figures['class'] = $figure->getAttribute('class');
    // here I tried to create the whole array but I can't seem to get the values from the HTML 
    // also I'm not sure how to get all html-elements within <figure>   
} 

Here is a Demo.

  • 写回答

2条回答 默认 最新

  • donpb2823 2019-06-16 13:08
    关注

    Here is the code that should get you where you want to be. I have added comments where I felt they would be helpful:

    <?php
    
    $htmlString = 'Here is some <a href="#">text</a> which I do not need to extract but then there are <figure class="class-one"><img src="/example.jpg" alt="example alt" class="some-image-class"><figcaption>example caption</figcaption></figure>And another one (and many more)<figure class="class-one some-other-class"><img src="/example2.jpg" alt="example2 alt"></figure>';
    
    //Create a new DOM document
    $dom = new DOMDocument;
    
    //Parse the HTML.
    @$dom->loadHTML($htmlString);
    
    //Create new XP
    $xp = new DOMXpath($dom);
    
    //Create empty figures array that will hold all of our parsed HTML data
    $figures = array();
    
    //Get all <figure> elements
    $figureElements = $xp->query('//figure');
    
    //Create number variable to keep track of our $figures array index
    $figureCount = 0;
    
    //Loop through each <figure> element
    foreach ($figureElements as $figureElement) {
        $figures[$figureCount]["class"] = trim($figureElement->getAttribute('class'));
        $figures[$figureCount]["img"]["src"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('src');
        $figures[$figureCount]["img"]["alt"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('alt');
    
        //Check that an img class exists, otherwise set the value to null. If we don't do this PHP will throw a NOTICE.
        if (boolval($xp->evaluate('//img', $figureElement)->item($figureCount))) {
            $figures[$figureCount]["img"]["class"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('class');
        } else {
            $figures[$figureCount]["img"]["class"] = null;
        }
    
        //Check that a <figcaption> element exists, otherwise set the value to null
        if (boolval($xp->evaluate('//figcaption', $figureElement)->item($figureCount))) {
            $figures[$figureCount]["figcaption"] = $xp->query('//figcaption', $figureElement)->item($figureCount)->nodeValue;
        } else {
            $figures[$figureCount]["figcaption"] = null;
        }
    
        //Increment our $figureCount so that we know we can create a new array index.
        $figureCount++;
    }
    
    print_r($figures);
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?