使所有绝对链接相对

I am looking for a regex solution for this problem. It can be a multiple step solution if this makes things easier. Important notice: The test string is just a snippet of a complete HTML DOM and only images should get addressed by this and any other URL should be left alone.

Here's an image:

<img 
src="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/image.jpg"
data-srcset="
 https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img1.jpg 507w,
 https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img2.jpg 780w,
 https://www.example.com/de/wp-content/uploads/sites/74/2017/03/img3.jpg 950w"
data-sizes="
 (min-width: 80em) calc(0.5 * (100vw - (100vw- 57em))),
 (min-width: 48em) calc(0.5 * (100vw - 5em)),
 calc(100vw - 1em)"
alt="image" class="lazyload">

As a oneliner:

<img src="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/image.jpg" data-srcset="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img1.jpg 507w, https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img2.jpg 780w, https://www.example.com/de/wp-content/uploads/sites/74/2017/03/img3.jpg 950w" data-sizes="(min-width: 80em) calc(0.5 * (100vw - (100vw- 57em))), (min-width: 48em) calc(0.5 * (100vw - 5em)), calc(100vw - 1em)" alt="image" class="lazyload">

The desired result is that need to get rid of protocol, domain, and first directory - that is to say: everything in front of the /wp-content. The language I am doing this in is php.

For the src part I have

 preg_replace("/(<img.*?src=\")(.*?)(\/wp-content.*?\")(.*>)/", '"$1$3$4"', $string);

The answer below is correct. Most HTML documents should be able to load. Do yourself a favor and try to be as valid as possible, this is a good thing anyways. If you don't produce the HTML in question yourself, try to process it before you consume it.

For the data-srcset problem just parse that argument separately.

Compare your DOM before and after completely. The @dom->saveHTML() method makes closed tags which do not need to be closed, closed. Like <meta arg="yada"/> turns to <meta arg="yada"> (closing backslash missing). Also see Are (non-void) self-closing tags valid in HTML5?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dougu3591 2017-03-30 14:36
关注
Don't. Use a parser to analyze the DOM and apply the regex on the DOM elements/attributes directly.

<?php $dom = new DOMDocument(); $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED); $xpath = new DOMXPath($dom); $images = $xpath->query("//img[contains(@src, 'wp-content')]"); $regex = '~^.+?(?=/wp-content/)~'; foreach($images as $img) { $img->setAttribute('src', preg_replace($regex, 'https://anotherdomain.com', $img->getAttribute('src')) ); } echo $dom->saveHTML();

It has been answered a dozen times why it is not a good idea to parse HTML with regular expressions, one of the most favourite answers being this: RegEx match open tags except XHTML self-contained tags.

However, if your HTML is not valid, you could use the following regex (in verbose mode):
(?:\G(?!\A)|<img) (?s:.+?\bsrc=['"])\K https?://.+?(?=/wp-content/)

See it working on regex101.com.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP 怎么给一个链接限制期限 php
2018-04-30 08:56

回答 6 已采纳 ``` 30)die("链接到期失效");//超过30天，失效 //其他操作 echo "链接的内容"; ?> ```
php如果让/index.php后面所有的链接都返回到首页？伪静态规则问题 apache php
2019-03-07 17:07

回答 1 已采纳你的首页链接写的是index.php，导致收录的其他页面的首页出现相对目录，最好在路径前面加个/
PHP require_once绝对路径与相对路径（不工作） php
2015-05-22 18:33

回答 1 已采纳 ../ denotes the parent directory, compared to the one you're currently in. Therefore, your first l
java相对路径和绝对路径_Java相对路径/绝对路径总结
2021-03-01 10:25

罗天远的博客 Java相对路径/绝对路径总结[@more@]1.基本概念的理解绝对路径：绝对路径就是你的主页上的文件或目录在硬盘上真正的路径，(URL和物理路径)例如： C:xyz est.txt 代表了test.txt文件的绝对路径。...
php怎么隐藏网页中文件链接的域名如图 php
2018-09-22 10:34

回答 3 已采纳出于防止钓鱼网站的考虑，现在的浏览器一般都不允许禁止状态栏。再说了，就算隐藏了，还是可以从下载页面中看到。要防止没有经过授权的人下载你的文件很简单，没有必要隐藏地址。只要你服务器动态判断用户身
PHP输出1-100的质数 php
2021-05-16 11:53

回答 1 已采纳 <?php header("content-type:text/html;charset=utf-8"); function getPrime($num){ $s=""; for (
PHP url链接变量 php
2013-12-15 10:34

回答 1 已采纳 <?php while($info = mysql_fetch_array( $data )): ?> <td class="width1left"><a href=
html绝对路径设置,html绝对路径修改为相对路径
2021-07-01 20:47

N0u6ht的博客 1.HTML绝对路径怎么写比如你要在网页上显示一张图片，这张图片在你计算机D盘下的images这个文件夹下，那就这样写2.HTML相对路径怎么写你用法没有错误，你错就错在斜杠上你应该用正斜杠(/)而不是用反斜杠(\)....
PHP - 从链接下载pdf文件并保存在本地文件夹中 php
2017-10-27 06:42

回答 3 已采纳 It is not clear what exactly you are doing and why your php script has to recognize pdf. If you a
PHP，从绝对路径中查找相对路径的最简单方法 php
2011-09-07 20:30

回答 2 已采纳 If I understand, you just want the part from the absolute path that is after the given part: func
php怎么post数据给相对路径文件？ php
2015-06-17 08:51

回答 1 已采纳是系统内部的吗？直接用函数调用不就行了？为什么要用http的方式呢？
php链接跳转页面,PHP基础：页面（链接）跳转教程
2021-05-04 01:15

卢庆春的博客制作网页时，页面之间的自动跳转是...此函数将发送一个初始HTTP头信息给浏览器，此时浏览器会根据此HTTP头中的链接跳转到定义的新页面中去。我们唯一需要的注意的地方是：在使用header()函数之前，不允许存在有任何...
php如何做到遍历所有文件后执行 php
2017-03-24 18:38

回答 3 已采纳写错了，是最好给出代码，然后用file_get_contents();执行也行
如何批量替换相对地址为绝对地址(利用bat批处理实现)
2021-01-20 00:52

如果你的url链接是相对路径“static/mapi.css”，你想把他批量替换成绝对路径“http://dev.baidu.com/wiki/static/map/cloud/static/mapi.css”。那么，你可以这样做：写一个PHP文件，把需要替换的网址写进去。这...
JSP —— 关于绝对路径和相对路径
2016-07-25 14:45

Liekkas_BX的博客　1、相对路径-顾名思义，相对路径就是相对于当前文件的路径。网页中一般表示路径使用这个方法。 2、绝对路径-绝对路径就是你的主页上的文件或目录在硬盘上真正的路径。绝对路径就是你的主页上的文件或目录在硬盘上...
没有解决我的问题, 去提问

悬赏问题

¥15 运筹学中在线排序的时间在线排序的在线LPT算法
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 lammps拉伸应力应变曲线分析
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥15 请问Lammps做复合材料拉伸模拟，应力应变曲线问题
¥30 python代码，帮调试，帮帮忙吧

使所有绝对链接相对

1条回答 默认 最新

悬赏问题

1条回答默认最新