dqs13465424392 2016-08-05 20:19
浏览 25
已采纳

在关键字将文本转换为数组

I'm trying to get a block of text to an array in PHP dividing the text up at key words, in this case Option n:, where n is any character or number. Here is a sample text:

Example input

OPTION A: Lorem ipsum dolar sit
Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a tristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc.
Note: This option requires Option K-1: Extended Drill Depth. Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

OPTION D: Quisque efficitur
Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet.

OPTION G: Duis leo arcu
Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor.
Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

OPTION IL: Fusce fermentum
Donec sed sagittis purus. Aliquam auctor nibh a varius sagittis. Nullam eget nulla orci. Nam eu dolor posuere, semper dui vitae, mattis leo. Vestibulum vitae dolor fringilla, gravida nulla ac, malesuada urna.

OPTION O: Morbi elementum
Nunc mi nisi, tempus non finibus nec, vulputate quis augue. Sed bibendum, dui nec venenatis efficitur, turpis libero efficitur odio, ac mollis est ex ut arcu. Aenean congue a metus quis euismod. Etiam at dui urna. Duis elementum, sapien ac volutpat mollis, augue neque pellentesque arcu, at finibus ligula nulla et libero. Curabitur vel mauris tortor. Mauris suscipit neque ac mauris lacinia tristique. Quisque faucibus semper lectus, eu ultricies sapien ultrices nec.

Desired output

Ideally I'd like the above sample to look like this:

array:15 [▼
  0 => "OPTION A: Lorem ipsum dolar sit
        

        Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a ristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc. 

        Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque."
  1 => "OPTION D: Quisque efficitur
        

        Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet."
  2 => "OPTION G: Duis leo arcu
        

        Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor. 

        Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque."

  3 = > ...
  4 => ...
  etc.
]

Alternatively using the Option n: text as the array key and the description as the value would be elegant as well, but I have no idea how to accomplish this.

Using preg_split()

I have been trying to use preg_split() with little success, my current progress is here:

preg_split('/(Option [\w]+: \s*([^
]*))/', $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Which outputs:

array:15 [▼
  0 => "OPTION A: Lorem ipsum dolar sit"
  1 => "Lorem ipsum dolar sit"
  2 => """
    

    Ut mattis velit nec tortor congue gravida. Duis leo arcu, maximus vel convallis vitae, laoreet in metus. Duis nec nisl id eros tincidunt dignissim. Sed condimentum commodo mi, a ristique risus vehicula ut. Sed eget ultrices lacus. Curabitur sed eleifend sapien, nec pharetra nunc. 

    Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

    """
  3 => "OPTION D: Quisque efficitur"
  4 => "Quisque efficitur"
  5 => """
    

    Morbi elementum metus posuere congue scelerisque. Vestibulum blandit pulvinar leo sit amet ornare. Maecenas porttitor lectus augue, et scelerisque nisl imperdiet non. Curabitur vel ligula sit amet leo auctor malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin facilisis erat ipsum, ut sagittis velit aliquam a. Nulla nulla orci, dapibus at ullamcorper suscipit, aliquam vel nisl. Duis eu libero ut leo ornare tempor. Donec egestas ipsum nec augue pellentesque aliquet.

    """
  6 => "OPTION G: Duis leo arcu"
  7 => "Duis leo arcu"
  8 => """
    

    Aenean porttitor nulla eu eleifend hendrerit. Duis sed pretium nunc, sed semper leo. Nam sit amet quam semper, tempor risus vitae, consequat ex. Quisque ut rutrum enim, aliquet sodales justo. Morbi fringilla ac justo vitae molestie. Donec in molestie mauris, a scelerisque dolor. 

    Note: Nunc eu est bibendum nibh ullamcorper fermentum eget ut ante. Cras sed eros ac odio congue auctor. Nunc vel euismod neque.

    """
  9 => "OPTION IL: Fusce fermentum"
  10 => "Fusce fermentum"
  11 => """
    

    Donec sed sagittis purus. Aliquam auctor nibh a varius sagittis. Nullam eget nulla orci. Nam eu dolor posuere, semper dui vitae, mattis leo. Vestibulum vitae dolor fringilla, gravida nulla ac, malesuada urna.

    """
  12 => "OPTION O: Morbi elementum"
  13 => "Morbi elementum"
  14 => """
    

    Nunc mi nisi, tempus non finibus nec, vulputate quis augue. Sed bibendum, dui nec venenatis efficitur, turpis libero efficitur odio, ac mollis est ex ut arcu. Aenean congue a metus quis euismod. Etiam at dui urna. Duis elementum, sapien ac volutpat mollis, augue neque pellentesque arcu, at finibus ligula nulla et libero. Curabitur vel mauris tortor. Mauris suscipit neque ac mauris lacinia tristique. Quisque faucibus semper lectus, eu ultricies sapien ultrices nec.
    """
]

As you can see for some reason it is duplicating the line immediately following the keywords as well as splitting the description text into its own entry.

My question is this: is there a better/more reliable method to accomplish this outside of preg_split(), e.g., substr in combination with other methods? If not how can I fix my logic to accomplish my goal?

Update with working solution

Thanks to @RomanPerekhrest I am using the following code to generate the desired array: preg_match_all("/ ?OPTION [\w:]+:.+?(?= OPTION\s|$)/s", $input, $outputArray);

There was an issue where if an option was referenced in the body of the description it would delete the rest of the line from that point on. The solution was to alter the regexp from this:

"/OPTION [^:]+:.+?(?= ?OPTION\s|$)/s"

To this:

"/ ?OPTION [\w:]+:.+?(?= OPTION\s|$)/s"

I am still very new to regex but if I understand correctly the removal of the ? after the new line constraint makes the new line a requirement rather than optional, therefore the options will only ever be put into the array as a new key if they are on a new line, or are the first line.

  • 写回答

5条回答 默认 最新

  • dongshuobei1037 2016-08-05 20:50
    关注

    The solution using preg_match_all function:

    // $text is your input text
    preg_match_all("/OPTION [^:]+:.+?(?=
    ?OPTION\s|$)/s", $text, $matches);
    print_r($matches[0]);  // now $matches[0] contains the array of needed items
    

    /s modifier. If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines

    (?=...) - positive lookahead assertion. Matches the current OPTION content if it's followed by next OPTION or it's the last OPTION in the list( ?OPTION\s|$)

    DEMO link

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 Arcgis相交分析无法绘制一个或多个图形
  • ¥15 seatunnel-web使用SQL组件时候后台报错,无法找到表格
  • ¥15 fpga自动售货机数码管(相关搜索:数字时钟)
  • ¥15 用前端向数据库插入数据,通过debug发现数据能走到后端,但是放行之后就会提示错误
  • ¥30 3天&7天&&15天&销量如何统计同一行
  • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)