dongnai2804 2018-12-22 01:15
浏览 58
已采纳

提取html标签外的文本[关闭]

I am trying to extract text using preg_match() which is not contained in tags like <p> or <img>. This text is retrieved from a database and I am working in PHP.

This should be extracted <p>I do not want this</p> This should be extracted <a>This may appear after other tags and I do not want this</a>

I have tried to do (.*)(<p>|<a>|<\/p>|<\/a>)(.*) but this will capture everything up till the last tag and the earlier tags are captured together with text outside of tags.

I have tried to search on Stackoverflow like this: Match text outside of html tags but the regex provided has a pattern error when I pasted it in regex101.com.

Would appreciate any help on this, thanks.

  • 写回答

1条回答 默认 最新

  • dpje52239 2018-12-22 02:17
    关注

    You can use PHP's DOMDocument and DOMXPath to get the values that you want. The trick is to wrap the HTML from your database in a (for example) <div> tag, and you can then load it into a DOMDocument and use DOMXPath to search for children of the <div> tag which are purely text using the text() path:

    $html = 'This should be extracted <p>I do not want this</p> This should also be extracted <a>This may appear after other tags and I do not want this</a>';
    $doc = new DOMDocument();
    $doc->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
    $xpath = new DOMXPath($doc);
    $texts = array();
    foreach ($xpath->query('/div/text()') as $text) {
        $texts[] = $text->nodeValue;
    }
    print_r($texts);
    

    Output:

    Array ( 
        [0] => This should be extracted
        [1] =>  This should also be extracted 
    )
    

    Demo on 3v4l.org

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘