douwaif22244
douwaif22244
2016-09-22 13:47

将2个带有正则表达式的preg_split应用于文本

  • regex
  • expression
  • php
已采纳

Context: I have to split an email with several customers’ reservations details that is received every day, with a set of rules. This is an example of the email:

A N K U N F T   11.08.15
*** NEUBUCHUNG ***
 11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F882129  dsdsaidsaia
 F882129  xxxyxyagydaysd
sadsdsdsdsadsadadssda
sadsdsdsdsadsadadssda
**«CUT HERE2»**


A N K U N F T   18.08.15
*** NEUBUCHUNG ***
 11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F881554  ZXCXZCXCXZCCXZ
 F881554  xcvcxvcxvcvxc
 F881554  xvcxvcxcvxxvccvxxcv

**«CUT HERE»**


11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F881605  xczxcdfsfdsdfs
 F881605  zxccxzxzdffdsfds

**«CUT HERE»**

So it basically has to be cut whenever the last F999999 appears (where 9 can be any digit), because F999999 is the reservation code.* I inserted the text: «CUT HERE» just to better understand where to cut.

*NOTE: reservation code may have the following formats: F999999, A999999, E999999 or 999999.

So I apply a working preg_split with the following regex:

Regex1 = "/(?:\\s(F|A|E)?\\d{6}\\s?+.*?
\\s?
)\\K//ms";

However sometimes I have to cut where «CUT HERE2» appears, because sometimes there is some text after the reservation code delimiter.

So I created this regex:

Regex2 = "/^\h*(F|A|E)?\d{6}.*?\R{2}\K/ms"

Yet, I sometimes have this format (newlines between, F999999 of the same reservation), making my previous regex (regex2) cut where it says «NOT CUT HERE»:

A N K U N F T   11.08.15
*** NEUBUCHUNG ***
 11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F882129  dsdsaidsaia

<<NOT CUT HERE>>

 F882129  xxxyxyagydaysd
sadsdsdsdsadsadadssda
sadsdsdsdsadsadadssda
**«CUT HERE»**


A N K U N F T   18.08.15
*** NEUBUCHUNG ***
 11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F881554  ZXCXZCXCXZCCXZ

<<NOT CUT HERE>>

 F881554  xcvcxvcxvcvxc
 F881554  xvcxvcxcvxxvccvxxcv

**«CUT HERE»**


11.08.15  xxx  xxx  X3 2830  14:25   17:50
 18.08.15  xxx  xxx  X3 2831  18:40
 F881605  xczxcdfsfdsdfs
 F881605  zxccxzxzdffdsfds

**«CUT HERE»**

I just want it to cut where «CUT HERE» appears.

This error happens for example:

***NEUBUCHUNG ***
 23.02.17  DUS  FNC  DE 1414  12:05   15:10
 09.03.17  FNC  DUS  DE 1415  16:40
 FNC011  Enotel Baia                  9360-215 Ponta do Sol
  1  DZ Typ I Meerblick 2Erw.         Frühstück
 am 03.10.16  CRS: MX  - PNR: 1290689
 Fluggeber: Condor Flugdienst / PNR: 1290689  Frühbucher 10%  inkl. Reiseleitung  und Transfer ab/bis   
 A025808  HERR Berg, Ulrich               62


<<NOT CUT HERE>

Anfrage.
 A025808  FRAU Berghaus, Petra            58

 **«CUT HERE»**

***S T O R N O **
 04.10.16 STR  X3 2810
 11.10.16 FNC  STR  X3 2811  18:15
FNC036    The Flame Tree               Funchal
 1  DZ Meerblick 2Erw.                 H
A987025  FRAU  BURG, GERTRUD          *** STORNO ***              O


<<NOT CUT HERE>>


A987025  HERR  BURG, WALTER           *** STORNO ***              O

**«CUT HERE»**

***ÄNDERUNG ***
NEU:01.11.16 FRA  X3 2806  13:35   16:50
08.11.16 FNC  FRA  X3 2807  17:40
   FNC813    Golden Residence/Wanderk. 9000-105 Funchal
 1  Suite seitl. Meerblick 3Erw.       F
A982512 FRAU   KROST, SIMONE
Frühbucher 15%


<<NOT CUT HERE>>

inkl. Reiseleitung
und Transfer ab/bis 
Im Reisepreis bereits enthalten: Drei
geführte Wanderungen (1 Ganztags- und 2
Halbtagswanderungen) inkl. aller
Transfers.

**«SHOULD CUT HERE»**

***ÄNDERUNG ***
ALT:01.11.16 FRA  X3 2806  13:35   16:50
08.11.16 FNC  FRA  X3 2807  17:40
FNC813   Golden Residence/Wanderk. 9000-105 Funchal
 1  Suite seitl. Meerblick 3Erw.   F
   A982512      HERR KROST, SIMONE 

**«CUT HERE»**


 25.04.17  DRS  FNC  ST 1602  13:25   17:15
 09.05.17  FNC  DRS  ST 1607  00:00
 FNC076  Baia Azul                    9004-530 Funchal
  1  DZ Typ I Meerblick 2Erw.         Halbpension
 am 03.10.16  CRS: MX  - PNR: 15326821
 Fluggeber: alltours / PNR: 15326821
 inkl. Reiseleitung
 und Transfer ab/bis Flughafen
 A025986  HERR Schulze, Steffen           55
 A025986  FRAU Schulze, Kerstin           54

**«CUT HERE»**

***S T O R N O **
 13.11.16 FRA  X3 2806
 20.11.16 FNC  FRA  X3 2807  17:35
FNC096    Pestana Village & Miramar    Funchal
 1  Studio 2Erw.                       H
A976918  FRAU  HEBING, BETTINA        *** STORNO ***              O

<<NOT CUT HERE>> 

A976918  HERR  HEBING, LUDGER         *** STORNO ***              O

  **«CUT HERE»**

I put «NOT CUT HERE» where it splits but shouldn’t. I put: «SHOULD CUT HERE» where it should cut. And i put «CUT HERE» were it cuts correctly.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答

  • doulue1949 doulue1949 5年前

    You may use

    '~^\h*F\d{6}.*?\R{2}\K~sm'
    

    See the regex demo

    Details:

    • ^ - start of a line
    • \h* - 0+ horizontal whitespaces
    • F\d{6} - F + 6 digits -.*? - any 0+ chars up to the first
    • \R{2} - 2 linebreaks
    • \K - and omit the whole match text.

    See PHP demo:

    $re = '~^\h*F\d{6}.*?\R{2}\K~ms'; 
    $str = "A N K U N F T   11.08.15
    *** NEUBUCHUNG ***
     11.08.15  xxx  xxx  X3 2830  14:25   17:50
     18.08.15  xxx  xxx  X3 2831  18:40
     F882129  dsdsaidsaia
     F882129  xxxyxyagydaysd
    sadsdsdsdsadsadadssda
    sadsdsdsdsadsadadssda
    
    A N K U N F T   18.08.15
    *** NEUBUCHUNG ***
     11.08.15  xxx  xxx  X3 2830  14:25   17:50
     18.08.15  xxx  xxx  X3 2831  18:40
     F881554  ZXCXZCXCXZCCXZ
     F881554  xcvcxvcxvcvxc
     F881554  xvcxvcxcvxxvccvxxcv
    
    
    11.08.15  xxx  xxx  X3 2830  14:25   17:50
     18.08.15  xxx  xxx  X3 2831  18:40
     F881605  xczxcdfsfdsdfs
     F881605  zxccxzxzdffdsfds
    
    "; 
    print_r(preg_split($re, $str));
    
    点赞 评论 复制链接分享

为你推荐