使用xpath查询将xpath正则表达式捕获组作为字符串返回

CONTEXT

Supposing the following HTML

....
<p>Whatever</p>
<div>Whatever DIV78232 Everwhat</div>
....

Question:

How could I return a plain text string containing DIVnnnnn, where nnnnn represents any digits.

My investigation so far:

The xPath replace() function will replace a pattern found inside the current DOM.

replace(.,'.*?(DIV\d+).*','$1') => DIV78232

Why am I blocked?

Because I want the query to return the "DIV78232" as a string, without actually replacing it in the DOM at all, just as it would return "Whatever" for the query /p/text() [I am trying all this on the FirePath firefox-extension]

Note: According to the official DOCS

"replace() Returns the value of the first argument with every substring matched by the regular expression that is the value of the second argument replaced by the replacement string that is the value of the third argument."

FINAL PURPOSE:

My final purpose is to get the (string) IMAGE URL that matches '.*?image:.*?"(.+?)".*' from this (which is inside the HTML):

In this case, the query //*[matches(.,'.*?image:.*?"(.+?)".*','i')] returns the whole node, but I only want the first Capturing Group, which would be the IMAGE URL.

<script>...vp&output=xml_vast2&unviewed_position_start=1&
url='+encodeURIComponent(location.href)+'
description_url='+encodeURIComponent(location.href)+'&
image:   "https://domain.com/xxxxxxx/public_images/2015.12/article/56797be1c46188ac438b45c3.jpg", // stretching: 'fi..</script>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyan2469 2015-12-22 22:17
关注
Took me a long while, but this is the result I got by combinating replace() and tokenize()

tokenize(replace(.,'.*?image:.*?"(.+?)".*?',':@:$1:@:'),':@:')[2]

Returns the image URL in the snippet above mentioned.

Why/How does this work?

Replace() matches the image and wraps the capturing group with my own token separator ':@:' (Could be anything original)

Tokenize() splits the replaced string in 3 parts, being the second one the capturing group I was looking for. (It will be three parts because it is highly improbable that the document will contain ':@:' anywhere else)

Is there any faster way to achieve this?

Thanks. All the best. Peace.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

关于xpath和正则表达式应用的问题 python
2021-02-14 20:00

回答 2 已采纳 import re html = ''' <tr style="height:44px;" tridx="3" id="r-3-0"> <td class="fh tac
在使用PHP正则表达式的字符串后查找以下x次出现？ php
2012-04-24 14:08

回答 1 已采纳 Whilst using regex to parse html is usually not good, using it to match certain pieces of html for
如何使用XPath解析HTML字符串 html
2018-10-05 10:08

回答 1 已采纳 you can use htmlquery: doc, err := htmlquery.LoadURL("http://example.com/") or use string: s
Python 解析本地html文件之XPath语法、XPath结合正则表达式使用、实例
2022-05-11 14:49

CDamogu的博客 etree.xpath BeautifulSoup pyquery ...本文重点介绍etree.xpath etree.xpath 使用 ...Python 正则表达式 模块导入from lxml import etree 载入本地html文件或者字符串 载入本地html文件,并完整打印
在PHP中使用XPath替换XML属性 php xml
2019-06-11 17:26

回答 1 已采纳 The answer as Nigel Ren suggested was just to remove these two lines, as they no longer apply: $
Xpath查询返回部分空值（PHP） php xml
2016-09-28 11:42

回答 1 已采纳 If you do something like: $xml = simplexml_load_string($tmpstr); $smsts = $xml->xpath('//TS');
使用DOMXPath用XPath表达式替换foreach循环 php xml
2017-01-16 17:09

回答 1 已采纳 Xpath 1.0 expression will return a list of nodes, they can to some extend flatten an existing stru
php 正则字符串替换字符串,PHP用正则表达式替换字符串(Php replace string with regex)...
2021-03-23 13:43

吸奇侠的博客 PHP用正则表达式替换字符串(Php replace string with regex)我想用“”替换我的文件中的所有标签“”。我试过这个解决方案：$_text = preg_replace('', '', $_text);但我用“<>”替换“”$_text = preg_...
php解析html内容的字符串变量中的XPath php
2014-09-09 11:28

回答 1 已采纳 You mean something like this? $doc->loadXML('<img src="path/to/image.ext><br>some
php xpath将节点值作为Array返回 php xml
2014-03-08 15:04

回答 1 已采纳 XPath queries return a sequence of result nodes, not a single one. SimpleXML returns this as an ar
正则，xpath，bs4匹配 python 正则表达式
2021-09-22 15:54

回答 1 已采纳一、本质原因是Unicode在HTML中和Python中的不同表示方法以unicode e412为例在HTML中 <p> &#xe412 </p> 在python中 '\
PHP匹配多行的正则表达式分析
2020-12-13 06:17

在PHP中，正则表达式是一种强大的文本处理工具，用于搜索、替换和处理字符串。当涉及到多行文本的匹配时，我们需要特别注意正则表达式的模式修正符和元字符的使用。本文主要讨论如何在PHP中使用正则表达式来匹配多行...
正则表达式 详解
2021-12-26 18:09

Yy_Rose的博客详解正则表达式及其相关用法，归纳总结常用的匹配规则模式
使用正则表达式的灵活文本格式支持
2021-04-08 04:16

在IT行业中，正则表达式（Regular Expression）是一种强大的文本处理工具，用于匹配、查找、替换和提取符合特定模式的字符串。它在XML、C#和.NET开发中扮演着重要角色，尤其在处理灵活的文本格式时。本文将深入探讨...
python爬虫里信息提取的核心方法: Beautifulsoup、Xpath和正则表达式
2017-06-01 22:33

LINGOJAMES的博客提取的手段主要有三种：xpath、BeautifulSoup、正则表达式（Re）。下面分别进行介绍：（一）BeautifulSoup 从本心来说，我更喜欢用BeautifulSoup。因为它更符合直观语义特性，find（）和find_all...
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 java 的protected权限，问题在注释里
¥15 这个是哪里有问题啊？

使用xpath查询将xpath正则表达式捕获组作为字符串返回

1条回答 默认 最新

悬赏问题

1条回答默认最新