I have a HTML string. For the purposes of this lets say the string is:
<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds
Now lets look at the string i need to do some work on, this is what gmail saves the image name as inside src="":
cid:image001.jpg@01D05CBF.CF7A44B0
The class i use downloads and saves the attachment as follows:
$cid = 'cid:image001.jpg@01D05CBF.CF7A44B0';
$mail_id . '_' . $cid . '_' . $image_id;
So the actual image name is something like this: 308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg
Now my aim is to replace all of these occurrences:
cid:image001.jpg@01D05CBF.CF7A44B0
with
attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg
essentially strip out the cid: string, append $mail_id and _ to the start of the string and _image001.jpg to end.
keep in mind ill possibly have a bunch of these embedded cid src in the html string
So not been so good with regex i am doing this in baby steps, first i'm trying to figure out how to replace cid:image001.jpg@01D05CBF.CF7A44B0 with attachments/308907_image001.jpg@01D05CBF.CF7A44B0 and then ill try and figure out how to append _image001.jpg on the end.
I managed to build the regex that highlights the whole image tag and running it in http://www.regexr.com/ it does highlight the cid: value in element [1]:
I was thinking something like this but it just returns an empty string but the logic seems to work in the regex tool so i cant figure out why its not working, maybe its because the regex has 3 elements and i need to access element [1] to get the cid: value, not sure:
$string = preg_replace('/(<img\b\s+.*?src=\")(.*?cid:.*?)(\">)/g', 'attachments/'.$mail_id.'_', $html);
but the problem here is i just need to replace cid: with attachments/308907_ and i dont want to replace the image001.jpg@01D05CBF.CF7A44B0 part.
I am also not sure of the best way to append the _image.jpg at the end. If it was just one replace i could do something like this:
$current_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0';
$new_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg';
str_replace($current_image_name, $new_image_name,$html);
But because there could be lots of these in the email i dont think that approach will work and it might not be good performance wise since some emails could be large in some cases.
My worry is that is not efficient doing calls since it could be a big email in parsing so maybe there is a way to do that at the same time as the preg_replace function.
I am happy to figure the actual code out if someone even points me in the right direction and gives me some hints on the best way to achieve this.