douhuang3740 2009-12-12 21:34
浏览 19
已采纳

在忽略其他标签的同时,将遗漏的<p>标签添加到HTML中的文本的最佳方法是什么?

I'm currently writing a function for parsing some HTML and adding tags where necessary. Basically i have a piece of HTML like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse feugiat, nunc at vestibulum egestas.

<script type="c">
    #include &lt;stdio.h&gt; 
    #define debug(var) printf(#var &quot; = %d
&quot;, var)
    int main(void)
    {
        int x = 12;
        debug(x)
        return 0;
    }
</script>

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse feugiat, nunc at vestibulum egestas.

<h3>Test Heading</h3>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ultricies luctus metus ut cursus.

<ol>
    <li>One</li>
    <li>Two</li>
    <li>Three</li>
</ol>

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ultricies luctus metus ut cursus.

If you notice there are no <p> tags around the paragraphs. I would like to parse this HTML and add the correct tags to the different paragraphs of text. Also whatever parser is used, it cannot touch any of the other valid HTML. For example, the headings and list should not be altered.

I've hacked together a solution using PHP and although it works, it's not fast or pretty to look at.

What is the best way to accomplish this?
Is there a nice PHP or Javascript based parser i could use for this?

I need to break the HTML down into elements, add tags and write the assembled HTML back to the page(?)

  • 写回答

2条回答 默认 最新

  • dtpa98038 2009-12-12 23:01
    关注

    My suggestion is to use HTML Tidy instead of hacking it together yourself.

    $output = tidy_repair_string($input);
    

    See HTML Tidy Configuration Options for a list of options. For what you need the default behaviour is probably fine.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图