doudi2005 2018-01-04 21:24
浏览 120
已采纳

将HTML解析为XML

I'm working with text/HTML diffing engine that's using XML in it's core but we're inputting HTML5 data, I wonder how to take care of tags that don't need to be closed in HTML5, but must be closed in XML. For Example:

<img alt="" height="239" src="http://example.com/image.png" width="272">

Do I need to convert every tag manually (Just like this example)?

Is there a tool that would do this for me? And save a headache escaping all self-closing HTML tags?

For example xml_parse() runs following code like it has an error, but body has a valid HTML which is invalid XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html [<!ENTITY Aacute "&#193;">]>
<body>
    <div>
        <figure class="table ">
            <figcaption>
                <p class="table_number"></p>
                <p class="table_title" epub:type="title"></p>
            </figcaption>
            <table class="code ">
                <tr>
                    <td width="50">
                        <img alt="" height="239" src="http://example.com/image.png" width="272">
                    </td>
                </tr>
            </table>
        </figure>
    </div>
</body>
  • 写回答

3条回答 默认 最新

  • dqch34769 2018-01-10 17:57
    关注

    In general, you can use PHP's built-in DOM handling routines to parse HTML and output XML:

    $html = <<< HEREDOC
    <!DOCTYPE html>
    <body>
        <div>
            <figure class="table ">
                <figcaption>
                    <p class="table_number"></p>
                    <p class="table_title" epub:type="title"></p>
                </figcaption>
                <table class="code ">
                    <tr>
                        <td width="50">
                            <img alt="" height="239" src="http://example.com/image.png" width="272">
                        </td>
                    </tr>
                </table>
            </figure>
        </div>
    </body>
    HEREDOC;
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    echo $dom->saveXml($dom), PHP_EOL;
    

    Unfortunately, your use of an XML prolog and attempt to extend the HTML 5 Doctype as if it were an XML/SGML Doctype prevents the DOM library from successfully parsing it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流
  • ¥15 Rstudio 保存代码闪退
  • ¥20 win系统的PYQT程序生成的数据如何放入云服务器阿里云window版?
  • ¥50 invest生境质量模块
  • ¥15 nhanes加权logistic回归,svyglm函数