dongliao3742 2011-12-06 12:43
浏览 79
已采纳

PHP用html内容解析xml

Is it possible in php with the default xml classes to parse an xml file in such a way that only elements from one namespace are considered to be xml? I want to parse xml files in which some elements contain html code, and preferably I don't want to encapsulate every element with cdata tags, or escape all special characters. Since html has a syntax quite similar to xml, most parsers won't be able to parse this correctly.

Example:

<ns:root>
    <ns:date>
        06-12-2011
    </ns:date>
    <ns:content>
        <html>
        <head>
        <title>Sometitle</title>
        </head>
        <body>
        --a lot of stuff here
        </body>
        </html>
    </ns:content>
</ns:root>

In this example I want all the html content inside to be the content of that element, and it shouldn't be parsed itself. Is this possible with the default parsers like simplexml etc, or should I write my own parser?

Edit: Let me explain my situation a little bit better: I want to create a little personal php framework in which code is separated from the HTML (similar to MVC, but not quite the same). However, many HTML code will be the same in multiple pages, but not everything, and some data from e.g. a database should be inserted in some pages, nothing different from normal websites. So I came up with the idea to use separate html component files, which can be parsed by an html script. This would look something like this:

main.fw:

<html>
<head>
    <title>
        <fw:placeholder name="title" />
    </title>
</head>
<body>
    <div id="menubar">
        <ul>
            <li>page1</li>
            <li>page2</li>
        </ul>
    </div>
    <div id="content>
        <fw:placeholder name="maincontent" />
    </div>
</body>
</html>

page1.fw

<fw:component file="main.fw">
    <fw:content name="title">
        page1
    </fw:content>
    <fw:content name="maincontent" />
        some content with html
    </fw:content>
</fw:component>

Result after parsing: page1

  • page1
  • page2
some content with html

This question is mainly about that second type of file, in which html is nested inside xml elements.

  • 写回答

4条回答 默认 最新

  • douhuan7862 2011-12-07 00:01
    关注

    An XML file with some parts that are not XML is not an XML file. Thus you can't expect that an XML parser will be able to parse it. For a document to be XML the whole thing must be XML.

    What you are asking for is essentially "is there a parser that will parse my made-up angle-bracket language." Maybe DOMDocument->loadHTML() or html5lib will interpret it according to your expectations, but no guarantees.

    Is it really a terrible burden for your included "html" bits to be valid XML? This is good HTML hygiene anyway, and if you are willing to do that, you can implement your entire view system with XSL templates very easily. Most of the benefit of a node-aware template system is precisely that you can manipulate nodes directly and have pretty good assurances that the final document will be valid. Why have the burden of node-awareness with none of the benefit? You might as well use a string-based system like every other template system out there. At least it will be faster.

    Note that once you have constructed your final DOM, you can output it as something else, like HTML, so just because all your input templates are XML doesn't mean your output has to be.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)
  • ¥20 matlab yalmip kkt 双层优化问题
  • ¥15 如何在3D高斯飞溅的渲染的场景中获得一个可控的旋转物体