dongmei5168 2019-06-11 18:46
浏览 216

在XPath中使用OR运算符

I'm using the OR operator (more than once) in my XPath expression to extract what I need in the content before a specific string is encountered such as 'Reference,' 'For more information,' etc. Any of these terms should return the same result, yet they may not be in that order. For example, 'Reference' might not be first and may not be in the content at all, and one of the matches uses a table, 'About the data.' I want all content before any one of these strings appears.

Any help would be appreciated.

$expression =
    "//p[
        starts-with(normalize-space(), 'Reference') or 
        starts-with(normalize-space(), 'For more')
    ]/preceding-sibling::p";

That would also need to take into account the table:

$expression =
    "//article/table/tbody/tr/td[
        starts-with(normalize-space(), 'About the data used')
]/preceding-sibling::p";

Here's an example:

<root>
    <main>
        <article>
            <p>
                The stunning increase in homelessness announced in Los Angeles
                this week — up 16% over last year citywide — was an almost an
                incomprehensible conundrum.
            </p>
            <p>
                "We cannot let a set of difficult numbers discourage us
                or weaken our resolve" Garcetti said.
            </p>
            <p>
                References
                By Jeremy Herb, Caroline Kelly and Manu Raju, CNN
            </p>
            <p>
                For more information: Maeve Reston, CNN
            </p>
            <p>Maeve Reston, CNN</p>
            <table>
                <tbody>
                    <tr>
                        <td>
                            <strong>About the data used</strong>
                        </td>
                    </tr>
                    <tr>
                        <td>From
                        </td>
                        <td>Washington, CNN</td>
                    </tr>
                </tbody>
            </table>
        </article>
    </main>
</root>

The result I'm looking for would be the following.

<p>
    The stunning increase in homelessness announced in Los Angeles
    this week — up 16% over last year citywide — was an almost  an
    incomprehensible conundrum.
</p>
<p>
    "We cannot let a set of difficult numbers discourage us
    or weaken our resolve" Garcetti said.
</p>
  • 写回答

1条回答 默认 最新

  • dtrj21373 2019-06-11 20:21
    关注

    I want all content before any one of these strings appears.

    That is, you want the content before the first paragraph to contain one of these strings.

    The paragraphs that contain one of these strings are:

    p[starts-with(normalize-space(), 'References') or starts-with(....)]
    

    The first such paragraph is

    p[starts-with(normalize-space(), 'References') or starts-with(....)][1]
    

    The paragraphs before that are:

    p[starts-with(normalize-space(), 'References') or starts-with(....)][1]
    /preceding-sibling::p
    

    In 2.0 I would probably use a regular expression:

    p[matches(., '^\s*(References|For more information)')]
    

    to avoid the repeated calls on normalize-space().

    评论

报告相同问题?

悬赏问题

  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?
  • ¥15 matlab(相关搜索:紧聚焦)