使用正则表达式从数据库中提取数据（电子邮件主题行）

I'm hoping someone can help me get to the bottom of a problem I am having. I had a script put together about a year ago which parses incoming email and stores details in a database.

I get the email through with headers like so:

-------- Forwarded Message --------
Subject:    FS.G02 Fleet Street - j** associates (AG69)
Date:   Thu, 14 Apr 2016 11:27:32 +0000
From:   Stephanie Zo*****ou <Stephanie.Zo****ou@********.co.uk>
To:     'lucien@********.com' <lucien@********.com>

I use the following regex and PHP code to separate various pieces of data out ($text contains the above email string):

//Set RegEx to parse data out of text/plain email string
$re1 = '~(?<=From: )(.*?)(?: \<)(.*?)(?=\>)~';
$re2 = "~(?<=To: ').*(?=')~";
$re3 = "~(?<=Sent: ).*(?=)~";
$re4 = "~(?<=Subject: ).*(?=)~"; 
$re5 = "~(?<=Subject:\s)(.*?)(?=\s)(?:.*\s\-\s)(.*)~";
$re6 = "~\((.*?)\)~";

//Pull the data out using above expressions
if(preg_match($re1, $text, $matches1)) {
    $from_name = $matches1[1];
    $from_email = $matches1[2];
}
if(preg_match($re2, $text, $matches2))
    $to_email = $matches2[0];

if(preg_match($re3, $text, $matches3))
    $sent_date = $matches3[0];

if(preg_match($re4, $text, $matches4))
    $subject_line = $matches4[0];

if(preg_match($re5, $text, $matches5)) {
    $unit_code = $matches5[1];
    $company_name = $matches5[2];   
}

//Change sent date to timestamp
$sent_date = strtotime($sent_date);

//break the unit code and building code apart
$unit_code = explode('.',$unit_code,2);
$building_code = $unit_code[0];
$unit_code = $unit_code[1];
//break the (C0D3) off the end of the company  / subject line
$company_name = preg_replace($re6,'' ,$company_name);

The data I am trying to separate so that I can store in the DB are:

The email address after 'To:'
The time/date string after 'Date:'
The subject line

My problem is that the script has stopped working properly. My RegEx isn't giving me the timestamp, nor is it breaking down the subject line in to it's component parts:

FS.G02 Fleet Street - j** associates (AG69)

The code at the beginning is one piece of data I need. I then break it up in to the first two letters, and then the resulting alphanumerical second half.

FS.G02 Fleet Street - j associates** (AG69)

The second part I need is always after the hyphen - it's a company / customer name.

The format of this hasn't change since I last got it working so I can't tell if I have broken the RegEx. Is anyone who has a little more experience than I with RegEx able to see where I am going wrong?

Many thanks, Jonathan

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongya1875 2016-04-14 14:12

关注

Have you tried using imap_rfc822_parse_headers() (Docs) instead of using a regex? It would certainly make it a lot simpler.

EDIT: Realised the docs don't actually say a lot about the function. Here's a sample output, called on your data there:

object(stdClass)#1 (12) {
    ["date"]=> string(31) "Thu, 14 Apr 2016 11:27:32 +0000" 
    ["Date"]=> string(31) "Thu, 14 Apr 2016 11:27:32 +0000" 
    ["subject"]=> string(43) "FS.G02 Fleet Street - j** associates (AG69)"
    ["Subject"]=> string(43) "FS.G02 Fleet Street - j** associates (AG69)"
    ["toaddress"]=> string(69) "'lucien@********.com', UNEXPECTED_DATA_AFTER_ADDRESS@".SYNTAX-ERROR."" 
    ["to"]=> array(2) {
        [0]=> object(stdClass)#2 (2) {
            ["mailbox"]=> string(7) "'lucien" 
            ["host"]=> string(13) "********.com'" 
        }
        [1]=> object(stdClass)#3 (2) { 
            ["mailbox"]=> string(29) "UNEXPECTED_DATA_AFTER_ADDRESS"
            ["host"]=> string(14) ".SYNTAX-ERROR." 
        }
    }
    ["fromaddress"]=> string(55) "Stephanie Zo*****ou " 
    ["from"]=> array(1) {
        [0]=> object(stdClass)#4 (3) {
            ["personal"]=> string(19) "Stephanie Zo*****ou"  
            ["mailbox"]=> string(18) "Stephanie.Zo****ou"
            ["host"]=> string(14) "********.co.uk"
        }
    }
    ["reply_toaddress"]=> string(55) "Stephanie Zo*****ou "
    ["reply_to"]=> array(1) {
        [0]=> object(stdClass)#5 (3) {
            ["personal"]=> string(19) "Stephanie Zo*****ou"
            ["mailbox"]=> string(18) "Stephanie.Zo****ou"
            ["host"]=> string(14) "********.co.uk"
        }
    }
    ["senderaddress"]=> string(55) "Stephanie Zo*****ou "
    ["sender"]=> array(1) {
        [0]=> object(stdClass)#6 (3) {
            ["personal"]=> string(19) "Stephanie Zo*****ou"
            ["mailbox"]=> string(18) "Stephanie.Zo****ou"
            ["host"]=> string(14) "********.co.uk" 
        }
    }
 }

Here's a regex for your subject line as well:

([A-Z0-9]*\.[A-Z0-9]*)\s([A-Za-z\s]*)\s-\s([A-Za-z\s]*)\s(\([A-Z0-9]*\))

When called with preg_match(), like:

$output = [];
$input = "FS.G02 Fleet Street - Something associates (AG69)";
preg_match("/([A-Z0-9]*\.[A-Z0-9]*)\s([A-Za-z\s]*)\s-\s([A-Za-z\s]*)\s(\([A-Z0-9]*\))/", $input, $output);

You will receive something like:

array(
    0   =>  "FS.G02 Fleet Street - Something associates (AG69)",
    1   =>  "FS.G02",
    2   =>  "Fleet Street",
    3   =>  "Something associates",
    4   =>  "(AG69)"
)

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

使用正则表达式从数据库中提取数据（电子邮件主题行） php
2016-04-14 13:58

回答 1 已采纳 Have you tried using imap_rfc822_parse_headers() (Docs) instead of using a regex? It would certain
想使用正则表达式匹配，提取文本中特定的内容。 python 正则表达式
2022-01-19 16:23

回答 2 已采纳这应该就是你想要的功能： import os, re def GetMiddleStr(content,startStr,endStr): '''提取字符串content当中，startStr
使用正则表达式提取文本数据，正则表达式如何写 python 有问必答正则表达式爬虫
2021-10-25 18:26

回答 2 已采纳 regex = r"('gender':\s*{[^}]+})|('glasses':\s*{[^}]+})|('emotion':.+.jpg')" 不清楚是否你每个文件都是类似的，如果不行，再
Java程序中使用正则表达式
2020-12-09 11:25

xieting20210324的博客一、简介 正则表达式是一种用来...方法二、使用正则表达式，正则表达式可以用字符串来描述规则，并用来匹配字符串。 // 例如，判断手机号位数是否为11位，是否由数字组成。我们可以正则表达式<\d{11}> boolean
怎么用正则表达式来提取一段话的不同数据，并把他们分别放入excel中？ python 正则表达式
2022-06-27 17:34

回答 1 已采纳拿得用后端来做。前端做不到。先读取文件。然后用正则、字符串分割提取数据（存起来）。再然后用插件生成 excel . 你这个 txt 感觉没啥统一的规律。得统一才能用
正则表达式如何写，在一段字符串中提取指定的内容。 python 正则表达式
2022-05-03 20:38

回答 8 已采纳 import re text = """福建省2022年道路交通事故人身损害赔偿相关数据【福建一般地区（除厦门外）】 1、全省城镇居民人均年可支配收入 51140元2、全省农村居民人均年可支配收
如何使用正则表达式提取特定字符串后面的数字正则表达式
2018-10-26 07:16

回答 9 已采纳你用的什么语言，比如java 你的代码匹配之后，group(0)是 pages:13 group(1)是13
验证电子邮件地址 php,PHP正则表达式 验证电子邮件地址
2021-04-26 13:37

weixin_39633774的博客我们最经常遇到的验证，就是电子邮件地址验证。...现在用PHP语言实现一下电子邮件地址验证程序，用的是PHP正则表达式库。源代码如下：/*** [verifyEmail description]* @param string $str 邮箱字...
使用正则表达式和php从html中提取javascript对象 php
2018-05-07 23:24

回答 4 已采纳 The simple solution to your problem is to use the s pattern modifier to command the . (any charact
如何在正则表达式中使用变量？ javascript 前端正则表达式
2022-01-09 11:44

回答 1 已采纳 /regex\d/g您可以构造一个新的RegExp对象，而不使用语法：var replace = "regex\d";var re = new RegExp(replace,"g"); 您可以通过这种
在PHP中使用正则表达式进行用户名验证 php
2017-07-08 07:51

回答 3 已采纳 The following pattern will work: ^[a-z0-9][a-z0-9_]*[a-z0-9]$ ^[a-z0-9]: first character may not
分享 5 个关于正则表达式的实际应用场景
2023-10-12 08:00

前端达人的博客在这些强大工具中，正则表达式成为每个开发者工具库中基本且不可或缺的组成部分。正则表达式，通常被称为 RegEx ，为程序员提供了高效且多功能的解决方案，适用于各种编程应用。本教程将探讨它们的重要性，并展示五...
C#正则表达式提取字符串 asp.net c# 正则表达式
2020-04-20 15:33

回答 3 已采纳 ``` (?<=$)\w+ \w+(?=$) ```
【Java基础】正则表达式应用
2020-02-23 00:06

墩墩分墩的博客 **懒惰匹配** - 有时，我们更需要懒惰匹配，也就是`匹配尽可能少的字符`。... - 匹配是一个循环过程，正则表达式`a.*?b `应用于字符串`aabab`的话，第一次匹配从`索引0(第一个字符)`开始，它会匹
正则表达式（Java）
2022-11-03 09:29

RY.618的博客为什么要学正则表达式？
没有解决我的问题, 去提问

悬赏问题

¥15 应该如何判断含间隙的曲柄摇杆机构，轴与轴承是否发生了碰撞？
¥15 vue3+express部署到nginx
¥20 搭建pt1000三线制高精度测温电路
¥15 使用Jdk8自带的算法，和Jdk11自带的加密结果会一样吗，不一样的话有什么解决方案，Jdk不能升级的情况
¥15 画两个图 python或R
¥15 在线请求openmv与pixhawk 实现实时目标跟踪的具体通讯方法
¥15 八路抢答器设计出现故障
¥15 opencv 无法读取视频
¥15 按键修改电子时钟，C51单片机
¥60 Java中实现如何实现张量类，并用于图像处理(不运用其他科学计算库和图像处理库）)

码龄粉丝数原力等级 --

使用正则表达式从数据库中提取数据（电子邮件主题行）

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

使用正则表达式从数据库中提取数据（电子邮件主题行）

1条回答 默认 最新

悬赏问题

1条回答默认最新