寻找强大的HTML DOM方法来正确提取包含单个撇号的属性的文本值

As part of a migration task of data, I am extracting some data from some html, the values in alt and title attributes of the img html element using PHP.

An example of the source html is:

<img src='myimage.jpg' alt='Andy's garden vegetables' title='Andy's garden vegetables'/>

As you can see, in the source html, the values of the alt and title attributes have their start and finish (container characters) denoted by a single apostrophe ' But within the text itself, the single apostrophe is used in possessive ownership sense to say vegetables belonging to Andy.

So for a simple parser, this is going to be problematic as it would incorrectly regard the apostrophe within the text as the end of the value, as in 'Andy' rather than 'Andy's garden vegetables'.

The solution I can think of to incorporate further surrounding text into a regex to clarify the start and finish of the attribute value, as in the alt=' and the ' at the end. Though this would not work if there are spaces between the = or if double quotes were used. I think that the ' single apostrophes may not be legal html but that is the data I have to work with.

Is there a more robust solution than regex, perhaps html dom based that can handle ' single apostrophes within the text and distinguish them from being used as delimiters?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dougehe2022 2013-11-18 08:50
关注
I think this is what you're asking for?:

(?<=alt='|title=').+(?='\s)

I just used positive lookahead/lookbehind to identify the tags and the closing single apostrophe.
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

vue3 v-html 插入dom 正确的写法请教？ javascript vue.js 前端有问必答
2022-05-23 14:23

回答 6 已采纳你可以了解下动态组件 components is 循环一下组件名这种，应该是能满足你的需求的 v-html并不能写自定义组件楼上给出答案了
使用PHP Simple HTML DOM Parser从html中提取dom元素 html php
2016-01-05 19:48

回答 1 已采纳 There are several problems: getElementsByTagName apparently returns a single node, not an array,
使用PHP Simple HTML DOM Parser提取HTML纯 html php
2016-09-25 15:51

回答 1 已采纳 $escapedHtmlChars = ""; $htmlElements = ""; $html = file_get_html('https://my.playstation.com/obai
2024前端面试题！（附答案及解析）（2024.4月最新版）
2023-03-27 14:44

Komorebi ঞ꧔ꦿ的博客 2024前端面试题！面试宝典！总结心得！（附答案及解析）会持续更新哦！（2024.4月最新版）
简单的HTML Dom - 在div之后找到文本 php
2019-05-05 04:40

回答 3 已采纳 Welcome! It seems you want to parse an HTML using DOMDocument method. If that might be the case,
DOM。从选项标记中的给定文本获取值属性 php
2015-03-24 10:37

回答 2 已采纳 The xpath expression string(//option[.="7-Zip"]/@value) will find any <option> element whose
PHP简单HTML DOM - 如何获取标记内的文本 html php
2016-04-02 09:04

回答 1 已采纳 try: innertext() innertext used for Read or write the inner HTML text of element. foreach($ht
前端（HTML,CSS,JS）面试题
2023-05-12 14:56

坠入昏云间的博客只要是window的属性和方法，在使用的时候都可以省略window window.alert() 可以省略window写成alert() window.document 可以省略window写成document (3).window对象有一个特殊属性叫做name，它永远都是一个字符串，...
vue 如何监听dom属性 javascript vue.js 前端
2022-11-27 22:24

回答 3 已采纳绑定input事件可以获取 <div class="edit-area full_size init_position" contenteditable="true" ref="editor"
使用DOMDOCUMENT来提取JavaScript值 html javascript php
2014-10-17 11:22

回答 1 已采纳 You can't execute the javascript using PHP (obviously), so the best thing might be to grab the JS
如何在js代码里正确获取thymeleaf的值 html javascript 前端
2022-06-12 21:12

回答 1 已采纳 ${cangpin.began}这个值是undefined的吧
HTML及CSS基础入门：web前端基础，看这一篇就够了
2023-04-27 15:38

郝郝郝郝_七的博客这是一篇十分值得收藏的web前端基础文章，希望对看到的读者有所帮助。 web入门HTTP基础HTML文本标签1.块标签和内联标签2.标题标签3.段落标签4.内联标签`<strong>`5.内联标签`<span>`6.图片标签img插入...
简单的html dom过滤器表单获取名称和值为php数组 html php
2016-10-15 10:22

回答 1 已采纳 this should do the trick: <?php include_once('simple_html_dom.php'); $url = '<!DOCTYPE ht
【牛客】前端工程师-HTML专项练习知识点整理（一）
2022-03-13 19:01

strawberryᝰ的博客在 HTML 页面上，当按下键盘上的任意一个键时都会触发 javascript 的（）事件。...大前端方向包括了web前端，App页面，小程序页面等。html5平常方面都可以应用，但是大量应用于移动应用程序和游戏，因为用H..
【小白学前端】JS第六天DOM
2022-08-27 15:45

qq_43146801的博客小白自学前端，在此记录自己学习过程中遇到的问题、解决方法、总结与反思。欢迎大家留言，一起进步！
没有解决我的问题, 去提问

悬赏问题

¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器

寻找强大的HTML DOM方法来正确提取包含单个撇号的属性的文本值

2条回答 默认 最新

悬赏问题

2条回答默认最新