使用preg_split拆分复杂的字符串

Update v2

Using the code by Jerry works on most strings, but not all of them, like:

$pattern = '#^(?<tz_utf>(?:\([^)]+\)|[^-]+)+)\s+-\s+(?<tz>[^:]+)\s+:\s+(?<fr>[^/]+)\s+/\s+(?<en>[^/]+)\s+/\s+(?<ar>\S+)\s+(?<tz_dec_utf>[ⴰ-⵿ -]+)\s+(?<tz_dec>.*)$#imu';

// In this string, it doesn't validate because of no space between slash & word;
// /Alphabet => / Alphabet
// and comma in Arabic;
// ájóéHCG ,á«é¡J => ájóéHCGá«é¡J
$str4 = 'ⴰⴳⵎⵎⴰⵢ - agemmay : Alphabet, épellation /Alphabet, spelling / ájóéHCG ,á«é¡J
    ⴰⴳⵎⵎⴰⵢ - ⵓⴳⵎⵎⴰⵢ - ⵉⴳⵎⵎⴰⵢⵏ
    agemmay  – ugemmay  – igemmayen';

$str5 = 'ⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ - addad amaruz : Etat d’annexion / Construct state / ¥ÉëdEG ádÉM
    ⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ - ⵡⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ
    addad  amaruz  - waddad amaruz';

$str6 = 'ⴰⴷⴷⴰⴷ ⵉⵍⴻⵍⵍⵉ - addad ilelli : Etat libre / Free state / ∫É°SQEG ádÉM
    ⴰⴷⴷⴰⴷ ⵉⵍⴻⵍⵍⵉ
    addad  ilelli';


print_r( preg_match($pattern, $str, $matches) );

Update v1

The code I'm using now matches only one portion of the whole string ($matches[1]), is it possible to extract other portions of the string using one regex?:

$pattern = '/-(.*?)\:/';    
$str1 = 'ⵜⴰⵙⵎⵙⵙⵉⵜ - tasmessit : Focalisée / Focus / QCÉÑe ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ tasmssit - tsmssit - tismssitin';
preg_match($pattern, $str1, $matches);
$arr1 = array( 
        'tz_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ', 
        'tz'=> $matches[1], // tasmessit
        'fr'=>'Focalisée', 
        'en'=>'Focus', 
        'ar'=>'QCÉÑe', 
        'tz_dec_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ', 
        'tz_dec'=>'tasmssit - tsmssit - tismssitin'
);
print_r($matches[1]);

Original question

For any regular expression gurus out there :)

Can you please help preg_split some strings to an array? The string value may vary and look similar to this scheme:

$str1 = 'ⵜⴰⵙⵎⵙⵙⵉⵜ - tasmessit : Focalisée / Focus / QCÉÑe ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ tasmssit - tsmssit - tismssitin';
$str2 = 'ⵜⴰⵙⵏⴰⵥⵖⵓⵕⵜ ( ⵏ-) - tasnaÇvurt (n-)  : Etymologique / Etymological / »dÉKCG ⵏ ⵜⵙⵏⴰⵥⵖⵓⵕⵜ n tesnaÇvurt';
$str3 = 'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - tasvunt tanadawt : Subordonnant / Subordinating (conjunction) / §HGQ ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - ⵜⵉⵙⵖⵡⴰⵏ ⵜⵉⵏⴰⴷⴰⵡⵉⵏ tasvunt tanadawt - tisevwan tinadawin';

The correct results would be;

$arr1 = array( 
                'tz_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ', 
                'tz'=>'tasmessit', 
                'fr'=>'Focalisée', 
                'en'=>'Focus', 
                'ar'=>'QCÉÑe', 
                'tz_dec_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ', 
                'tz_dec'=>'tasmssit - tsmssit - tismssitin'
        );
$arr2 = array( 
                'tz_utf'=>'ⵜⴰⵙⵏⴰⵥⵖⵓⵕⵜ ( ⵏ-)', 
                'tz'=>'tasnaÇvurt (n-)', 
                'fr'=>'Etymologique', 
                'en'=>'Etymological', 
                'ar'=>'»dÉKCG', 
                'tz_dec_utf'=>'ⵏ ⵜⵙⵏⴰⵥⵖⵓⵕⵜ', 
                'tz_dec'=>'n tesnaÇvur'
        );
$arr3 = array( 
                'tz_utf'=>'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ', 
                'tz'=>'tasvunt tanadawt', 
                'fr'=>'Subordonnant', 
                'en'=>'Subordinating (conjunction)', 
                'ar'=>'§HGQ', 
                'tz_dec_utf'=>'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - ⵜⵉⵙⵖⵡⴰⵏ ⵜⵉⵏⴰⴷⴰⵡⵉⵏ', 
                'tz_dec'=>'tasvunt tanadawt - tisevwan tinadawin'
        );

The tz_utf are Tifinagh charcters in unicode.

Thanks

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongzi5062 2013-09-29 14:03
关注
Try using the regex:

~^(?<tz_utf>(?:$[^)]+$|[^-]+)+)\s+-\s+(?<tz>[^:]+)\s+:\s+(?<fr>[^/]+)\s+/\s+(?<en>[^/]+)\s+/\s+(?<ar>\S+)\s+(?<tz_dec_utf>[ⴰ-⵿ -]+)\s+(?<tz_dec>.*)$~ui

Warning, I'm not sure about the special character part as from the Armenian characters (I used a \S+ for them assuming they are single word and I used a range from this site for the characters which appear like white squares), but it's working for the sample your provided.

regex101 demo
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用preg_split拆分复杂的字符串 php
2013-09-29 12:28

回答 1 已采纳 Try using the regex: ~^(?<tz_utf>(?:$[^)]+$|[^-]+)+)\s+-\s+(?<tz>[^:]+)\s+:\s+(?&lt
php preg_split将字符串从方括号拆分为数组 php
2019-04-04 15:41

回答 1 已采纳 Your pattern [\[]*[\][] matches 0+ times an opening bracket and then either a opening or closing b
如何使用preg_split在数字之间的空格处拆分字符串？ php
2014-01-29 21:35

回答 1 已采纳 Try using lookaround assertions, like this: $result = preg_split('/(?<=\d)\s+(?=\d)/', $string
php preg_split函数,PHP使用preg_split函数分割含换行和分号字符串
2021-04-21 16:11

弃医从everything的博客 PHP使用preg_split函数preg_ split() 函数用于正则表达式分割字符串。它与split()和explode()函数的主要区别是：split()函数：用正则表达式来把字符串拆分并返回数组，如果出错则返回false。preg_split() 函数：用 ...
使用preg_split（）使用<span>拆分字符串 php
2014-02-23 22:56

回答 2 已采纳 If it always has the "-" then this would be the most simple way: $span = explode("-", $spans);
字符串拆分PHP PREG_MATCH php
2015-01-21 11:53

回答 2 已采纳 You should escape your parenthesis $separator = '#([a-zA-Z0-9. \-/+]+)-$[a-zA-Z0-9. \-/]+$;
不能使用php中的preg_match通过特殊符号拆分字符串 php
2017-04-15 18:35

回答 2 已采纳 It's only a typo (I think), you have included the character ! in the first character class. Remove
php preg_split,PHP使用preg_split和explode分割textarea存放内容的方法分析
2021-03-24 01:11

weixin_39820173的博客本文实例讲述了PHP使用preg_split和explode分割textarea存放内容的方法。分享给大家供大家参考，具体如下：今天有个紧急的bug，说是后台在配置了白名单后，在手机端app无效，仍然显示内容。收到邮件后，便走了遍流程...
使用preg_split在“，”和“和”上拆分字符串 php
2010-03-18 10:33

回答 2 已采纳 preg_split('/(?:,| and )/', $sting);
使用preg_split（）使用标签拆分文本 php
2010-08-05 19:14

回答 2 已采纳 First of all: use a parser to modify XML (something like SimpleXML of DOM could suit you fine, dep
PHP preg_split（）模式用于按句子分割，除了按浮点数/价格中的句点分割 php
2012-04-28 15:32

回答 3 已采纳 This one may work $res = preg_split('/\.[^\d]/', $str);
php preg split,关于php：preg_split-按空格和所选字符分隔，但将字符保留在数组中...
2021-04-12 15:53

濮泱的博客所以我有这个问题，我想用包含空格()和逗号(，)的模式分割字符串。我设法按该模式拆分字符串，但问题是我想将该逗号保留在数组中。...这是我如何拆分它：$split = preg_split('/[\\s,]+/', $string, -1, ...
PHP，将.docx拆分成段落（PREG_SPLIT，“/\。 n / u”） php
2016-07-11 11:56

回答 1 已采纳 If you want to deal with any kind of linebreak, use \R: $splitted_para_arr = preg_split("/\.\R/",
php preg_split,split(),preg_split()与explode()函数分析与介
2021-03-24 01:11

weixin_39637179的博客 split()函数可以实现使用正则表达式来把字符串拆分为较小的块，并作为一个数组返回，如果出现错误，则返回false。同样也可以根据你需要选择要返回多少个小块。array split(string $pattern,string $string [,int $...
PHP使用preg_split函数分割含换行和分号字符串
2018-05-30 10:36

番石榴-452124076的博客它与split()和explode()函数的主要区别是：split()函数：用正则表达式来把字符串拆分并返回数组，如果出错则返回false。preg_split() 函数：用 Perl 兼容正则表达式语法，通常比 split() 更快。explode()函数：...
php用特殊字符分割,PHP使用preg_split()分割特殊字符
2021-04-22 03:25

柏傅美的博客这篇文章主要介绍了PHP使用preg_split()分割特殊字符(元字符等)的方法,结合具体实例形式分析了php正则分割的操作技巧与注意事项,需要的朋友可以参考下具体如下：这里所说的特殊字符就是正则中使用的特殊字符,如: | ....
php正则 字符串转数组,利用PHP的字符串函数str_split和正则函数preg_split将字符串转换成数组...
2021-04-22 12:21

小徐様的博客 /***PCRE函数*preg_split通过一个正则表达式分隔字符串*语法preg_split($pattern,$subject,$limit=-1,$flags=0)*$pattern用于搜索的模式，字符串形式*$subject输入字符串*$limit如果指定，将限制分隔得到的子串最多...
php 正则分隔_PHP preg_split()：使用正则表达式分割字符串
2021-08-06 11:32

reg183的博客 PHP preg_split() 函数通过一个正则表达式来分割字符串，语法如下： array preg_split ( string $pattern , string $subject [, int $limit = -1 [, int $flags = 0 ]] ) 参数说明如下： pattern：用于匹配的模式，...
php explode 效率,PHP字符分割explode，split，preg_split性能比较
2021-04-12 20:40

100offer的博客 PHP字符分割explode，split，preg_split性能比较三个函数都是用来对字符串进行分割，下面分几个实验来比较之间的性能。1. explode与split比较都用字符进行分割，执行10次代码：for($num=0;$num<9;$num++){$stime=...
PHP 基本语法知识及常用库函数(php字符串拆分及匹配preg_split, preg_match)
2018-06-01 12:16

que_csdn的博客常用库函数： 1.array preg_split('/[\s+]|[\s_]+/',$source_string);根据加号or下划线or空格将字符串划分开,结果存在返回数组中 2.int preg_match('/\d+/',$source_string,$result_array);将字符串中的数字截取...
没有解决我的问题, 去提问

悬赏问题

¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥30 python代码，帮调试，帮帮忙吧