douduan7295 2015-03-23 15:10
浏览 30
已采纳

检查字符串标题和内部编号列表级别

I need to correct a string with wrong heading-tags and missing p-tags:

<h3>1. Title</h3>
Text
<h3>1.1 Subtitle</h3>
Text
<h3>1.2. Subtitle</h3>

Should get

<h2>1. Title</h2>
<p>Text</p>
<h3>1.1. Subtitle</h3>
<p>Text</p>
<h3>1.2. Subtitle</h3>

That means every heading of a first level of the list should be a h2-tag. The second level could have the format 1.1. or 1.1, which should be corrected with the missing . If there is no tag at all, a p-tag should be added.

$lines = explode(PHP_EOL, $text);
foreach ($lines as $line) {
    if(!strpos($line,"<h")) $line = '<p>'.$line.'</p>';
    $output = $output.$line;
}

So this adds the missing p-tags, but I don't know how to take care of the heading tags and the optional missing point of the second level.

  • 写回答

3条回答 默认 最新

  • douyue8364 2015-03-23 16:05
    关注

    This will use a regular expression for getting the different parts, and determine what header level to use depending on the number (h2 for 1., h3 for 1.2 etc). This would work if the HTML you are parsing is really as simple as per your example. If not, I would strongly recommend that you take a look at the DOMDocument parser instead.

    $html = <<<EOS
    <h3>1. Title</h3>
    Text
    <h3>1.1 Subtitle</h3>
    Text
    <h3>1.2. Subtitle</h3>
    Text
    EOS;
    
    $lines = explode(PHP_EOL, $html);
    
    foreach ($lines as $line) {
        if (preg_match('/^<(\w.*?)>([\d\.]*)(.*?)</', $line, $matches)) {
            $tag    = $matches[1]; // "h3"
            $number = $matches[2]; // "1.2"
            $title  = $matches[3]; // "Subtitle"
    
            if ($tag == 'h3') {
                $level = preg_match_all('/\d+/', $number) + 1;
                $tag = 'h' . $level;
                if (substr($number, -1, 1) != '.')
                    $number .= '.';
    
                $line = "<$tag>$number$title</$tag>";
            }
        }
        else {
            $line = "<p>$line</p>";
        }
        echo $line, PHP_EOL;
    }
    

    Output:

    <h2>1. Title</h2>
    <p>Text</p>
    <h3>1.1. Subtitle</h3>
    <p>Text</p>
    <h3>1.2. Subtitle</h3>
    <p>Text</p>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作