dongliang7545 2010-10-28 18:27
浏览 144
已采纳

解析h1,h2和h3标题标签内关键字外观的内容

Given a block of content, I'm looking to create a function in PHP to check for the existence of a keyword or keyword phrase inside an h1-h3 header tags...

For example, if the keyword was "Blue Violin" and the block of text was...

You don't see many blue violins. Most violins have a natural finish. <h1>If you see a blue violin, its really a rarity</h1>

I'd like my function to return:

  • The keyword phrase does appear in an h1 tag
  • The keyword phrase does not appear in an h2 tag
  • The keyword phrase does not appear in an h2 tag
  • 写回答

2条回答 默认 最新

  • douwen5546 2010-10-28 18:49
    关注

    You can use DOM and the following XPath for this:

    /html/body//h1[contains(.,'Blue Violin')]
    

    This would match all h1 element inside the body element containing the phrase "Blue Violin" either directly or in a subnode. If it should only occur in the direct TextNode, change the . to text(). The results are returned in a DOMNodeList.

    Since you only want to know if the phrase appears, you can use the following code:

    $dom = new DOMDocument;
    $dom->load('NewFile.xml');
    $xPath = new DOMXPath($dom);
    echo $xPath->evaluate('count(/html/body//h1[contains(.,"Blue Violin")])');
    

    which will return the number of nodes matching this XPath. If your markup is not valid XHTML, you will not be able to use loadXML. Use loadHTML or loadHTMLFile instead. In addition, the XPath will execute faster if you give it a direct path to the nodes. If you only have one h1, h2 and h3 anyway, substitute the //h1 with a direct path.

    Note that contains is case-sensitive, so the above will not match anything due to the Mixed Case used in the search phrase. Unfortunately, DOM (or better the underlying libxml) does only support XPath 1.0. I am not sure if there is an XPath function to do a case-insensitive search, but as of PHP 5.3, you can also use PHP inside an XPath, e.g.

    $dom = new DOMDocument;
    $dom->load('NewFile.xml');
    $xpath = new DOMXPath($dom);
    $xpath->registerNamespace("php", "http://php.net/xpath");
    $xpath->registerPHPFunctions();
    echo $xpath->evaluate('count(/html/body//h1[contains(php:functionString("strtolower", .),"blue violin")])');
    

    so in case you need to match Mixed Case phrases or words, you can lowercase all text in the searched nodes before checking it with contains or use any other PHP function you may find useful here.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥17 pro*C预编译“闪回查询”报错SCN不能识别
  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?