dongqiao6730 2015-05-09 07:37
浏览 70
已采纳

preg_replace plus在src的开头和结尾附加,以替换cid:

I have a HTML string. For the purposes of this lets say the string is:

<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds

Now lets look at the string i need to do some work on, this is what gmail saves the image name as inside src="":

cid:image001.jpg@01D05CBF.CF7A44B0

The class i use downloads and saves the attachment as follows:

$cid = 'cid:image001.jpg@01D05CBF.CF7A44B0'; 
$mail_id . '_' . $cid . '_' . $image_id;

So the actual image name is something like this: 308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

Now my aim is to replace all of these occurrences:

cid:image001.jpg@01D05CBF.CF7A44B0

with

attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

essentially strip out the cid: string, append $mail_id and _ to the start of the string and _image001.jpg to end.

keep in mind ill possibly have a bunch of these embedded cid src in the html string

So not been so good with regex i am doing this in baby steps, first i'm trying to figure out how to replace cid:image001.jpg@01D05CBF.CF7A44B0 with attachments/308907_image001.jpg@01D05CBF.CF7A44B0 and then ill try and figure out how to append _image001.jpg on the end.

I managed to build the regex that highlights the whole image tag and running it in http://www.regexr.com/ it does highlight the cid: value in element [1]:

I was thinking something like this but it just returns an empty string but the logic seems to work in the regex tool so i cant figure out why its not working, maybe its because the regex has 3 elements and i need to access element [1] to get the cid: value, not sure:

$string = preg_replace('/(<img\b\s+.*?src=\")(.*?cid:.*?)(\">)/g', 'attachments/'.$mail_id.'_', $html);

but the problem here is i just need to replace cid: with attachments/308907_ and i dont want to replace the image001.jpg@01D05CBF.CF7A44B0 part.

I am also not sure of the best way to append the _image.jpg at the end. If it was just one replace i could do something like this:

$current_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0';
$new_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg';

str_replace($current_image_name, $new_image_name,$html);

But because there could be lots of these in the email i dont think that approach will work and it might not be good performance wise since some emails could be large in some cases.

My worry is that is not efficient doing calls since it could be a big email in parsing so maybe there is a way to do that at the same time as the preg_replace function.

I am happy to figure the actual code out if someone even points me in the right direction and gives me some hints on the best way to achieve this.

  • 写回答

1条回答 默认 最新

  • doujie1908 2015-05-09 07:58
    关注

    Try this,

    $re = "/src=\\\"cid:(.*?)@(.*?)\\\"/s"; 
    $str = "<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds"; 
    $subst = "src=\"attachments/".$mailid."_$1@$2_$1\""; 
    
    $result = preg_replace($re, $subst, $str);
    

    See Regex

    Updates:

    Pattern =/src=\"cid:(.*?)@(.*?)\"/s
    src= matches the characters src
    \"= matches the character " literally
    cid:= matches the characters cid:

    Now, We have to capture image name from the string, so that we can append and prepend it into the output string. Image name can be captured between cid: and @.

    Therefore cid:(.*?)@ will capture image name. This is the first capturing group in the pattern. (i.e.$1). Image name will be stored into $1 as it is the first captured group). If you use preg_match then it will be $match[1]

    Then we need string between @and " This is the second capturing group. So @(.*?)" which is mentioned as $2 in the preg_replace function.

    In preg_replace matched string will be stored into $0,$1 and so on. and in preg_match matched string will be stored into $match[0],$match[1]and so on.. And $match is the userdefined array name which will be parsed as third parameter in the function

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试
  • ¥20 问题请教!vue项目关于Nginx配置nonce安全策略的问题
  • ¥15 教务系统账号被盗号如何追溯设备
  • ¥20 delta降尺度方法,未来数据怎么降尺度
  • ¥15 c# 使用NPOI快速将datatable数据导入excel中指定sheet,要求快速高效
  • ¥15 再不同版本的系统上,TCP传输速度不一致
  • ¥15 高德地图2.0 版本点聚合中Marker的位置无法实时更新,如何解决呢?
  • ¥15 DIFY API Endpoint 问题。
  • ¥20 sub地址DHCP问题