duandan5471 2012-02-13 16:58
浏览 68
已采纳

拆分一个字符串,记住分裂的位置

Assume I have the following string:

I have | been very busy lately and need to go | to bed early

By splitting on "|", you get:

$arr = array(
  [0] => I have
  [1] => been very busy lately and need to go
  [2] => to bed early
)

The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:

tag_string('I have been very busy lately and need to go to bed early');

I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:

I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy

This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:

$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
  // split after 2nd, and thereafter after 8th word
}

With output:

$arr = array(
  [0] => I have-nn
  [1] => been-vb very-vb busy lately and-rr need to-r go
  [2] => to bed early-p
)

So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)

  • 写回答

3条回答 默认 最新

  • dongyong6332 2012-02-14 01:51
    关注

    Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:

    $str = "I have | been very busy lately and need to go | to bed early";
    
    function get_breaks($str)
    {
        $breaks = array();
        $arr = explode("|", $str);
    
        foreach($arr as $val)
        {
            $breaks[] = str_word_count($val);
        }
    
        return $breaks;
    }
    
    $breaks = get_breaks($str);
    
    echo "<pre>" . print_r($breaks, 1) . "</pre>";
    
    $str = str_replace("|", "", $str);
    
    function rebreak($str, $breaks)
    {
        $return = array();
        $old_break = 0;
    
        $arr = str_word_count($str, 1);
    
        foreach($breaks as $break)
        {
            $return[] = implode(" ", array_slice($arr, $old_break, $break));
    
            $old_break += $break;
        }
    
        return $return;
    }
    
    echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
    
    echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
    

    Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 前端echarts坐标轴问题
  • ¥15 CMFCPropertyPage
  • ¥15 ad5933的I2C
  • ¥15 请问RTX4060的笔记本电脑可以训练yolov5模型吗?
  • ¥15 数学建模求思路及代码
  • ¥50 silvaco GaN HEMT有栅极场板的击穿电压仿真问题
  • ¥15 谁会P4语言啊,我想请教一下
  • ¥15 这个怎么改成直流激励源给加热电阻提供5a电流呀
  • ¥50 求解vmware的网络模式问题 别拿AI回答
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳