在关键字将文本转换为数组

I'm trying to get a block of text to an array in PHP dividing the text up at key words, in this case Option n:, where n is any character or number. Here is a sample text:

Example input

OPTION A: Lorem ipsum dolar sit
Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a tristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc.
Note: This option requires Option K-1: Extended Drill Depth. Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

OPTION D: Quisque efficitur
Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet.

OPTION G: Duis leo arcu
Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor.
Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

OPTION IL: Fusce fermentum
Donec sed sagittis purus. Aliquam auctor nibh a varius sagittis. Nullam eget nulla orci. Nam eu dolor posuere, semper dui vitae, mattis leo. Vestibulum vitae dolor fringilla, gravida nulla ac, malesuada urna.

OPTION O: Morbi elementum
Nunc mi nisi, tempus non finibus nec, vulputate quis augue. Sed bibendum, dui nec venenatis efficitur, turpis libero efficitur odio, ac mollis est ex ut arcu. Aenean congue a metus quis euismod. Etiam at dui urna. Duis elementum, sapien ac volutpat mollis, augue neque pellentesque arcu, at finibus ligula nulla et libero. Curabitur vel mauris tortor. Mauris suscipit neque ac mauris lacinia tristique. Quisque faucibus semper lectus, eu ultricies sapien ultrices nec.

Desired output

Ideally I'd like the above sample to look like this:

array:15 [▼
  0 => "OPTION A: Lorem ipsum dolar sit
        

        Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a ristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc. 

        Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque."
  1 => "OPTION D: Quisque efficitur
        

        Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet."
  2 => "OPTION G: Duis leo arcu
        

        Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor. 

        Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque."

  3 = > ...
  4 => ...
  etc.
]

Alternatively using the Option n: text as the array key and the description as the value would be elegant as well, but I have no idea how to accomplish this.

Using preg_split()

I have been trying to use preg_split() with little success, my current progress is here:

preg_split('/(Option [\w]+: \s*([^
]*))/', $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Which outputs:

array:15 [▼
  0 => "OPTION A: Lorem ipsum dolar sit"
  1 => "Lorem ipsum dolar sit"
  2 => """
    

    Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a ristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc. 

    Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

    """
  3 => "OPTION D: Quisque efficitur"
  4 => "Quisque efficitur"
  5 => """
    

    Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet.

    """
  6 => "OPTION G: Duis leo arcu"
  7 => "Duis leo arcu"
  8 => """
    

    Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor. 

    Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

    """
  9 => "OPTION IL: Fusce fermentum"
  10 => "Fusce fermentum"
  11 => """
    

    Donec sed sagittis purus. Aliquam auctor nibh a varius sagittis. Nullam eget nulla orci. Nam eu dolor posuere, semper dui vitae, mattis leo. Vestibulum vitae dolor fringilla, gravida nulla ac, malesuada urna.

    """
  12 => "OPTION O: Morbi elementum"
  13 => "Morbi elementum"
  14 => """
    

    Nunc mi nisi, tempus non finibus nec, vulputate quis augue. Sed bibendum, dui nec venenatis efficitur, turpis libero efficitur odio, ac mollis est ex ut arcu. Aenean congue a metus quis euismod. Etiam at dui urna. Duis elementum, sapien ac volutpat mollis, augue neque pellentesque arcu, at finibus ligula nulla et libero. Curabitur vel mauris tortor. Mauris suscipit neque ac mauris lacinia tristique. Quisque faucibus semper lectus, eu ultricies sapien ultrices nec.
    """
]

As you can see for some reason it is duplicating the line immediately following the keywords as well as splitting the description text into its own entry.

My question is this: is there a better/more reliable method to accomplish this outside of preg_split(), e.g., substr in combination with other methods? If not how can I fix my logic to accomplish my goal?

Update with working solution

Thanks to @RomanPerekhrest I am using the following code to generate the desired array: preg_match_all("/ ?OPTION [\w:]+:.+?(?= OPTION\s|$)/s", $input, $outputArray);

There was an issue where if an option was referenced in the body of the description it would delete the rest of the line from that point on. The solution was to alter the regexp from this:

"/OPTION [^:]+:.+?(?= ?OPTION\s|$)/s"

To this:

"/ ?OPTION [\w:]+:.+?(?= OPTION\s|$)/s"

I am still very new to regex but if I understand correctly the removal of the ? after the new line constraint makes the new line a requirement rather than optional, therefore the options will only ever be put into the array as a new key if they are on a new line, or are the first line.

dtt27783
dtt27783 使用前瞻。
接近 4 年之前 回复

5个回答



使用 preg_match_all </ code>函数的解决方案:</ p>

  /  / $ text是你的输入文本
preg_match_all(“/ OPTION [^:] +:。+?(?=
?OPTION \ s | $)/ s”,$ text,$ matches);
print_r($ matches [0]); // now $ matches [0]包含所需项目数组
</ code> </ pre>

/ s </ code>修饰符。 如果设置了此修饰符,则模式中的点元字符将匹配所有字符,包括换行符</ em> </ p>

(?= ...)</ code> - 积极的先行断言</ em>。 匹配当前OPTION内容,如果它后面跟着下一个OPTION,或者它是列表中的最后一个OPTION(
?OPTION \ s | $ </ code>)</ p>

DEMO链接 </ p>
</ div>

展开原文

原文

The solution using preg_match_all function:

// $text is your input text
preg_match_all("/OPTION [^:]+:.+?(?=
?OPTION\s|$)/s", $text, $matches);
print_r($matches[0]);  // now $matches[0] contains the array of needed items

/s modifier. If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines

(?=...) - positive lookahead assertion. Matches the current OPTION content if it's followed by next OPTION or it's the last OPTION in the list( ?OPTION\s|$)

DEMO link

dqt20140129
dqt20140129 我已经更新了这个问题,但我想我也找到了解决方案,我将其添加到最后。 非常感谢!
接近 4 年之前 回复
doushuangai9733
doushuangai9733 你能用新的条件更新你的示例输入吗?
接近 4 年之前 回复
doushui3216
doushui3216 我目前正在使用这个解决方案,它几乎完全按照需要工作,但是我注意到在一些选项描述中,人们会引用其他选项,例如“注意:需要选项K-1 ...”,并且正在使用 它自己的数组键。 如何使它只选择在同一行上没有前置文本的选项,但忽略那些选项?
接近 4 年之前 回复
ds34222
ds34222 谢谢
接近 4 年之前 回复
dongmingxiang0312
dongmingxiang0312 我喜欢你的解决方案并删除了我的+ upvote
接近 4 年之前 回复



如何使用前瞻断言(如@Casimir所指出的那样):</ p>

 <  code> array_filter(preg_split('〜(?m)(?= ^ OPTION)〜',$ input),'trim'); 
</ code> </ pre>
</ div>

展开原文

原文

How to do it with a lookahead assertion (as @Casimir pointed out):

array_filter(preg_split('~(?m)(?=^OPTION)~', $input), 'trim');



在我看来,你可以使用explode来分割空白行。 尝试这样的事情:</ p>

  $ pieces = explode(“

”,$ input);
</ code> </ pre>

< p>以下是一个示例: https://repl.it/CkBl/0 </ p> \ n </ div>

展开原文

原文

Seems to me like you can just use explode to split on the blank lines. Try something like this:

$pieces = explode("

", $input);

Here is an example: https://repl.it/CkBl/0

douzhe3516
douzhe3516 不幸的是,输入文本在这里会有很大不同,具体取决于输入文本的人:有些条目根本没有新行,只有“选项n:<description>”列表
接近 4 年之前 回复



它在每次捕获时都会分裂,甚至是嵌套的捕获。 所以([^

] *)</ code>将在结果数组中创建单独的元素。 根据您的示例数据,您可以简单地拆分两个或多个换行符,以在每个数组元素中包含整个文本块:</ p>

  preg_split('/ [ 
] n {2,} /',$ input);
</ code> </ pre>

或者,如果您想依赖 OPTION </ code>字符串,请抓住整个 文本块,然后在以下之后修剪新行:</ p>

  $ result = preg_split('/(OPTION [\ w] +:。*)/',$ input, -  1,PREG_SPLIT_DELIM_CAPTURE); 
//删除尾随换行符
$ result = array_map('trim',$ result);
</ code> </ pre>
</ div>

展开原文

原文

It's splitting on every capture, even the nested one. So ([^ ]*) will create separate elements in the resulting array. Based on your example data you could simply split on two or more newlines to have the whole block of text in each array element:

preg_split('/[
]{2,}/', $input);

Or if you want to rely on the OPTION string instead, grab the whole block of text and then trim it of newlines after:

$result = preg_split('/(OPTION [\w]+:.*)/', $input, -1, PREG_SPLIT_DELIM_CAPTURE);
// Remove trailing newlines
$result = array_map('trim', $result);



看起来你想在换行符上拆分字符串。</ p>

  explode(“

“,$ string);
</ code> </ pre>
</ div>

展开原文

原文

Looks like you want to split string on line break.

explode("
", $string);

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问