使用正则表达式从数据库中提取数据(电子邮件主题行)

I'm hoping someone can help me get to the bottom of a problem I am having. I had a script put together about a year ago which parses incoming email and stores details in a database.

I get the email through with headers like so:

-------- Forwarded Message --------
Subject:    FS.G02 Fleet Street - j** associates (AG69)
Date:   Thu, 14 Apr 2016 11:27:32 +0000
From:   Stephanie Zo*****ou <Stephanie.Zo****ou@********.co.uk>
To:     'lucien@********.com' <lucien@********.com>

I use the following regex and PHP code to separate various pieces of data out ($text contains the above email string):

//Set RegEx to parse data out of text/plain email string
$re1 = '~(?<=From: )(.*?)(?: \<)(.*?)(?=\>)~';
$re2 = "~(?<=To: ').*(?=')~";
$re3 = "~(?<=Sent: ).*(?=)~";
$re4 = "~(?<=Subject: ).*(?=)~"; 
$re5 = "~(?<=Subject:\s)(.*?)(?=\s)(?:.*\s\-\s)(.*)~";
$re6 = "~\((.*?)\)~";

//Pull the data out using above expressions
if(preg_match($re1, $text, $matches1)) {
    $from_name = $matches1[1];
    $from_email = $matches1[2];
}
if(preg_match($re2, $text, $matches2))
    $to_email = $matches2[0];

if(preg_match($re3, $text, $matches3))
    $sent_date = $matches3[0];

if(preg_match($re4, $text, $matches4))
    $subject_line = $matches4[0];

if(preg_match($re5, $text, $matches5)) {
    $unit_code = $matches5[1];
    $company_name = $matches5[2];   
}

//Change sent date to timestamp
$sent_date = strtotime($sent_date);

//break the unit code and building code apart
$unit_code = explode('.',$unit_code,2);
$building_code = $unit_code[0];
$unit_code = $unit_code[1];
//break the (C0D3) off the end of the company  / subject line
$company_name = preg_replace($re6,'' ,$company_name);

The data I am trying to separate so that I can store in the DB are:

  1. The email address after 'To:'
  2. The time/date string after 'Date:'
  3. The subject line

My problem is that the script has stopped working properly. My RegEx isn't giving me the timestamp, nor is it breaking down the subject line in to it's component parts:

FS.G02 Fleet Street - j** associates (AG69)

The code at the beginning is one piece of data I need. I then break it up in to the first two letters, and then the resulting alphanumerical second half.

FS.G02 Fleet Street - j associates** (AG69)

The second part I need is always after the hyphen - it's a company / customer name.

The format of this hasn't change since I last got it working so I can't tell if I have broken the RegEx. Is anyone who has a little more experience than I with RegEx able to see where I am going wrong?

Many thanks, Jonathan

查看全部
dongtuo5262
dongtuo5262
2016/04/14 13:58
  • regex
  • php
  • parsing
  • email
  • 点赞
  • 收藏
  • 回答
    私信
满意答案
查看全部

1个回复