I am looking for a regex solution for this problem. It can be a multiple step solution if this makes things easier. Important notice: The test string is just a snippet of a complete HTML DOM and only images should get addressed by this and any other URL should be left alone.
Here's an image:
<img
src="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/image.jpg"
data-srcset="
https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img1.jpg 507w,
https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img2.jpg 780w,
https://www.example.com/de/wp-content/uploads/sites/74/2017/03/img3.jpg 950w"
data-sizes="
(min-width: 80em) calc(0.5 * (100vw - (100vw- 57em))),
(min-width: 48em) calc(0.5 * (100vw - 5em)),
calc(100vw - 1em)"
alt="image" class="lazyload">
As a oneliner:
<img src="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/image.jpg" data-srcset="https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img1.jpg 507w, https://www.example.com/de/wp-content/uploads/sites/1/2017/03/img2.jpg 780w, https://www.example.com/de/wp-content/uploads/sites/74/2017/03/img3.jpg 950w" data-sizes="(min-width: 80em) calc(0.5 * (100vw - (100vw- 57em))), (min-width: 48em) calc(0.5 * (100vw - 5em)), calc(100vw - 1em)" alt="image" class="lazyload">
The desired result is that need to get rid of protocol, domain, and first directory - that is to say: everything in front of the /wp-content. The language I am doing this in is php.
For the src part I have
preg_replace("/(<img.*?src=\")(.*?)(\/wp-content.*?\")(.*>)/", '"$1$3$4"', $string);
The answer below is correct. Most HTML documents should be able to load. Do yourself a favor and try to be as valid as possible, this is a good thing anyways. If you don't produce the HTML in question yourself, try to process it before you consume it.
For the data-srcset problem just parse that argument separately.
Compare your DOM before and after completely. The @dom->saveHTML()
method makes closed tags which do not need to be closed, closed. Like <meta arg="yada"/>
turns to <meta arg="yada">
(closing backslash missing). Also see Are (non-void) self-closing tags valid in HTML5?