duandan5471 2012-02-13 16:58
浏览 68
已采纳

拆分一个字符串,记住分裂的位置

Assume I have the following string:

I have | been very busy lately and need to go | to bed early

By splitting on "|", you get:

$arr = array(
  [0] => I have
  [1] => been very busy lately and need to go
  [2] => to bed early
)

The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:

tag_string('I have been very busy lately and need to go to bed early');

I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:

I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy

This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:

$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
  // split after 2nd, and thereafter after 8th word
}

With output:

$arr = array(
  [0] => I have-nn
  [1] => been-vb very-vb busy lately and-rr need to-r go
  [2] => to bed early-p
)

So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)

  • 写回答

3条回答 默认 最新

  • dongyong6332 2012-02-14 01:51
    关注

    Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:

    $str = "I have | been very busy lately and need to go | to bed early";
    
    function get_breaks($str)
    {
        $breaks = array();
        $arr = explode("|", $str);
    
        foreach($arr as $val)
        {
            $breaks[] = str_word_count($val);
        }
    
        return $breaks;
    }
    
    $breaks = get_breaks($str);
    
    echo "<pre>" . print_r($breaks, 1) . "</pre>";
    
    $str = str_replace("|", "", $str);
    
    function rebreak($str, $breaks)
    {
        $return = array();
        $old_break = 0;
    
        $arr = str_word_count($str, 1);
    
        foreach($breaks as $break)
        {
            $return[] = implode(" ", array_slice($arr, $old_break, $break));
    
            $old_break += $break;
        }
    
        return $return;
    }
    
    echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
    
    echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
    

    Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 Macbookpro 连接热点正常上网,连接不了Wi-Fi。
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 linux驱动,linux应用,多线程
  • ¥20 我要一个分身加定位两个功能的安卓app
  • ¥15 基于FOC驱动器,如何实现卡丁车下坡无阻力的遛坡的效果
  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析