doudi2005 2018-01-04 21:24
浏览 120
已采纳

将HTML解析为XML

I'm working with text/HTML diffing engine that's using XML in it's core but we're inputting HTML5 data, I wonder how to take care of tags that don't need to be closed in HTML5, but must be closed in XML. For Example:

<img alt="" height="239" src="http://example.com/image.png" width="272">

Do I need to convert every tag manually (Just like this example)?

Is there a tool that would do this for me? And save a headache escaping all self-closing HTML tags?

For example xml_parse() runs following code like it has an error, but body has a valid HTML which is invalid XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html [<!ENTITY Aacute "&#193;">]>
<body>
    <div>
        <figure class="table ">
            <figcaption>
                <p class="table_number"></p>
                <p class="table_title" epub:type="title"></p>
            </figcaption>
            <table class="code ">
                <tr>
                    <td width="50">
                        <img alt="" height="239" src="http://example.com/image.png" width="272">
                    </td>
                </tr>
            </table>
        </figure>
    </div>
</body>
  • 写回答

3条回答 默认 最新

  • dqch34769 2018-01-10 17:57
    关注

    In general, you can use PHP's built-in DOM handling routines to parse HTML and output XML:

    $html = <<< HEREDOC
    <!DOCTYPE html>
    <body>
        <div>
            <figure class="table ">
                <figcaption>
                    <p class="table_number"></p>
                    <p class="table_title" epub:type="title"></p>
                </figcaption>
                <table class="code ">
                    <tr>
                        <td width="50">
                            <img alt="" height="239" src="http://example.com/image.png" width="272">
                        </td>
                    </tr>
                </table>
            </figure>
        </div>
    </body>
    HEREDOC;
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    echo $dom->saveXml($dom), PHP_EOL;
    

    Unfortunately, your use of an XML prolog and attempt to extend the HTML 5 Doctype as if it were an XML/SGML Doctype prevents the DOM library from successfully parsing it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?