在XPath中使用OR运算符

I'm using the OR operator (more than once) in my XPath expression to extract what I need in the content before a specific string is encountered such as 'Reference,' 'For more information,' etc. Any of these terms should return the same result, yet they may not be in that order. For example, 'Reference' might not be first and may not be in the content at all, and one of the matches uses a table, 'About the data.' I want all content before any one of these strings appears.

Any help would be appreciated.

$expression =
    "//p[
        starts-with(normalize-space(), 'Reference') or 
        starts-with(normalize-space(), 'For more')
    ]/preceding-sibling::p";

That would also need to take into account the table:

$expression =
    "//article/table/tbody/tr/td[
        starts-with(normalize-space(), 'About the data used')
]/preceding-sibling::p";

Here's an example:

<root>
    <main>
        <article>
            <p>
                The stunning increase in homelessness announced in Los Angeles
                this week — up 16% over last year citywide — was an almost an
                incomprehensible conundrum.
            </p>
            <p>
                "We cannot let a set of difficult numbers discourage us
                or weaken our resolve" Garcetti said.
            </p>
            <p>
                References
                By Jeremy Herb, Caroline Kelly and Manu Raju, CNN
            </p>
            <p>
                For more information: Maeve Reston, CNN
            </p>
            <p>Maeve Reston, CNN</p>
            <table>
                <tbody>
                    <tr>
                        <td>
                            <strong>About the data used</strong>
                        </td>
                    </tr>
                    <tr>
                        <td>From
                        </td>
                        <td>Washington, CNN</td>
                    </tr>
                </tbody>
            </table>
        </article>
    </main>
</root>

The result I'm looking for would be the following.

<p>
    The stunning increase in homelessness announced in Los Angeles
    this week — up 16% over last year citywide — was an almost  an
    incomprehensible conundrum.
</p>
<p>
    "We cannot let a set of difficult numbers discourage us
    or weaken our resolve" Garcetti said.
</p>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dtrj21373 2019-06-11 20:21
关注
I want all content before any one of these strings appears.

That is, you want the content before the first paragraph to contain one of these strings.

The paragraphs that contain one of these strings are:

p[starts-with(normalize-space(), 'References') or starts-with(....)]

The first such paragraph is

p[starts-with(normalize-space(), 'References') or starts-with(....)][1]

The paragraphs before that are:

p[starts-with(normalize-space(), 'References') or starts-with(....)][1] /preceding-sibling::p

In 2.0 I would probably use a regular expression:

p[matches(., '^\s*(References|For more information)')]

to avoid the repeated calls on normalize-space().
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

在PHP中使用XPath替换XML属性 php xml
2019-06-11 17:26

回答 1 已采纳 The answer as Nigel Ren suggested was just to remove these two lines, as they no longer apply: $
使用DOMXPath在PHP中调用XML数据 php xml
2018-10-01 03:03

回答 1 已采纳 The problem is that there is a namespace on your VehicleDescription element. You need to register
在PHP中使用XPath获取href属性 php
2015-06-06 09:23

回答 1 已采纳 To get all href attributes of the hyperlinks, add some more axis steps, finally loop over the resu
php的html的xpath,Xpath定位的总结
2021-04-12 14:41

回忆宝箱的博客通过or运算符定位元素 //book[@category="children" or @cover="paperback"] 3.通过取反not运算符定位元素 //book[not(position()>2)] 取book标签中position大于2的 //book[not(position()>2)] not取反 //year[not(....
在PHP中使用XPath循环 php
2014-04-19 13:48

回答 1 已采纳 You can try the following approach. <?php $url = 'http://www.oxybet.ro/pariu/external/betfair-
在xpath查询中使用not php
2014-12-09 07:06

回答 1 已采纳 The problem with your logic is that no elements have both @class and @align, so the not() will alw
在Xpath查询中排除链接 php
2018-12-23 22:25

回答 1 已采纳 You can exclude link text nodes from results with //div[@class="intro"]//text()[not(parent::a)]
php xpath注入工具,XPath注入攻击学习
2021-05-05 02:08

weixin_39622289的博客 XPath简介XPath 是一门在 XML 文档中查找信息的语言，用于在 XML 文档中通过元素和属性进行导航XPath语法XPath中的符号XPath的数学运算符+ 加号表示加- 表示数字相减* 表示乘以div 表示除以，这里数学上的除号/已经...
如何在php中使用curl xpath在网站上获取特定图片 php
2017-04-28 22:04

回答 1 已采纳 Assuming you want the image the appears next to the first headline, the XPath is: function news($
如何在XPath中注册PHP函数？ php xml
2013-10-24 10:17

回答 1 已采纳 In your question it looks like a typo, there is no function named ends-with therefore I would expe
PHP - 在Xpath查询中进一步挖掘 php
2014-11-23 21:20

回答 1 已采纳 Try this : $url = 'http://www.craigslist.org/about/sites'; $output = file_get_contents($url); $d
python中xpath中加随机数_python之Xpath语法
2020-12-29 10:53

行勉的博客 python视频教程栏目介绍python的Xpath语法。一、XMl简介(一)什么是 XMLXML 指可扩展标记语言(EXtensible)XML 是一种标记语言，很类似 HTML。XML 的设计宗旨是传输数据，而非显示数据。XML 的标签需要我们自行定义。...
在PHP中使用substr（）不能处理xpath结果 php
2014-11-21 01:36

回答 1 已采纳 I overlooked the surrounding spaces that came with every xpath result. My substr() statement cuts
JAVA中XPATH取值数据对象_4，xpath获取数据
2021-03-10 07:41

芒果潔的博客 xpathXPath 使用路径表达式在 XML 文档中进行导航.XPath 使用路径表达式来选取 XML 文档中的节点或者节点集。这些路径表达式和我们在常规的电脑文件系统中看到的表达式非常相似。1) 可在XML中查找信息2) 支持HTML的...
php 使用 domxpath读写元素
2017-06-28 10:06

老朱-yubing的博客 php本身自带了,DomXpath,可以方便提取网页中的元素内容. 为了方便操作还需要启用tidy扩展包, windows系统在php.ini 中的extensions=php_tidy.dll行去掉注释即可. http://php.net/manual/zh/class.domxpath.php ...
没有解决我的问题, 去提问

悬赏问题

¥15 如何获取烟草零售终端数据
¥15 数学建模招标中位数问题
¥15 phython路径名过长报错不知道什么问题
¥15 深度学习中模型转换该怎么实现
¥15 HLs设计手写数字识别程序编译通不过
¥15 Stata外部命令安装问题求帮助！
¥15 从键盘随机输入A-H中的一串字符串，用七段数码管方法进行绘制。提交代码及运行截图。
¥15 TYPCE母转母，插入认方向
¥15 如何用python向钉钉机器人发送可以放大的图片？
¥15 matlab（相关搜索：紧聚焦）

在XPath中使用OR运算符

1条回答 默认 最新

悬赏问题

1条回答默认最新