duanpi7107 2018-05-05 16:41
浏览 30
已采纳

在捕获单词索引时获取括号外的所有文本

How can I get all text that's not in parenthesis using preg_match_all? The reason I need to use preg_match_all is because I want to get the index of each word.

Given sentence:

Hello how [t- are] you [t- today], Sir?

I can extract all the words inside the ( ), which works. How can I also get all text outside the ( ) separately?

preg_match_all('/\[t-(.*?)\]/', $this->target, $targetWords, PREG_OFFSET_CAPTURE);

Output:

Array
(
    [0] => Array
        (
            [0] =>  are
            [1] => 47
        ),
    [0] => Array
        (
            [0] =>  today
            [1] => some number
        )

)

Note: I already know about preg_split:

$outsideParenthesis = preg_split('/\[.*?\]/', $this->target);

But this doesn't allow me to maintain the index.


Note 2: It may help to provide my end goal:

I want to take a string of custom markdown. For each word, I want to generate word objects that specify their type and content.

The reason is, I'd like to send an array of word objects in order to the frontend so I can loop through the array and generate HTML elements with classes, so I can apply styling as needed.

And I want to be able to specify any markdown within, e.g.,

Hello how [t- are] you [k- today], Sir?

Where t- is target, k- is key.

So the final array I'd like would look like:

[
   [
      type => 'normal'
      content => 'Hello how '
   ],
   [
      type => 'target'
      content => 'are'
   ],
   [
      type => 'normal'
      content => ' you'
   ]
   [
      type => 'key'
      content => 'today'
   ]
   [
      type => 'normal'
      content => ', Sir?'
   ]
]

Here's my wordObjects function as of now:

private function setWordObjects($array, $type)
{
    return array_map(function ($n) use ($type) {
        return [
            'type' => $type,
            'content' => $n[0],
            'index' => $n[1]
        ];
    }, $array[1]);
}
  • 写回答

2条回答 默认 最新

  • doushang7209 2018-05-05 16:51
    关注

    Extended solution:

    $s = 'Hello how [t- are] you [k- today], Sir?';
    $types = ['t-' => 'target', 'k-' => 'key'];
    $splitted = preg_split('/\[([tk]- [^]]+)\]/', $s, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
    
    $result = [];
    foreach ($splitted as $v) {
        [$content, $pos] = $v;
        $k = substr($content, 0, 2);
        $is_delim = isset($types[$k]);
        $result[] = array_combine(['type', 'content', 'index'],
                                  [$is_delim? $types[$k] : 'normal',
                                  $is_delim? substr($content, 3) : $content,
                                  $is_delim? $pos + 3 : $pos]);
    }
    
    print_r($result);
    

    The output:

    Array
    (
        [0] => Array
            (
                [type] => normal
                [content] => Hello how 
                [index] => 0
            )
    
        [1] => Array
            (
                [type] => target
                [content] => are
                [index] => 14
            )
    
        [2] => Array
            (
                [type] => normal
                [content] =>  you 
                [index] => 18
            )
    
        [3] => Array
            (
                [type] => key
                [content] => today
                [index] => 27
            )
    
        [4] => Array
            (
                [type] => normal
                [content] => , Sir?
                [index] => 33
            )
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效