dongrunying7537 2012-01-30 17:23
浏览 60
已采纳

用PHP替换字符串中的多个单词

I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to do this on the command line.

So the program reads in a string, and asks me what I want to replace the first word with, and then the second word, and then the third word, and so on, until all words have been processed.

The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation and spacing.

Is there a proper way to do this?

  • 写回答

2条回答 默认 最新

  • duanjian3920 2012-01-30 18:09
    关注

    Given some text

    $subject = <<<TEXT
    I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to do this on the command line.
    
    So the program reads in a string, and asks me what I want to replace the first word with, and then the second word, and then the third word, and so on, until all words have been processed.
    
    The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation and spacing.
    
    Is there a proper way to do this?
    TEXT;
    

    You first tokenize the string into words and "everything else" tokens (e.g. call them fill). Regular expressions are helpful for that:

    $pattern = '/(?P<fill>\W+)?(?P<word>\w+)?/';
    $r = preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
    

    The job is now to convert the return value into a more useful data-structure, like an array of tokens and an index of all words used:

    $tokens = array(); # token stream
    $tokenIndex = 0;
    $words = array(); # index of words
    foreach($matches as $matched)
    {
        foreach($matched as $type => $match)
        {
            if (is_numeric($type)) continue;
            list($string, $offset) = $match;
            if ($offset < 0) continue;
    
    
            $token = new stdClass;
            $token->type = $type;
            $token->offset = $offset;
            $token->length = strlen($string);
    
            if ($token->type === 'word')
            {
                if (!isset($words[$string]))
                {
                    $words[$string] = array('string' => $string, 'tokens' => array());
                }
                $words[$string]['tokens'][] = &$token;
                $token->string = &$words[$string]['string'];
            } else {
                $token->string = $string;
            }
    
    
            $tokens[$tokenIndex] = &$token;
            $tokenIndex++;
            unset($token);
        }
    }
    

    Exemplary you can then output all words:

    # list all words
    
    foreach($words as $word)
    {
        printf("Word '%s' used %d time(s)
    ", $word['string'], count($word['tokens']));
    }
    

    Which would give you with the sample text:

    Word 'I' used 3 time(s)
    Word 'need' used 1 time(s)
    Word 'a' used 4 time(s)
    Word 'systematic' used 1 time(s)
    Word 'way' used 2 time(s)
    Word 'of' used 1 time(s)
    Word 'replacing' used 1 time(s)
    Word 'each' used 2 time(s)
    Word 'word' used 5 time(s)
    Word 'in' used 3 time(s)
    Word 'string' used 3 time(s)
    Word 'separately' used 1 time(s)
    Word 'by' used 1 time(s)
    Word 'providing' used 1 time(s)
    Word 'my' used 1 time(s)
    Word 'own' used 1 time(s)
    Word 'input' used 1 time(s)
    Word 'for' used 1 time(s)
    Word 'want' used 2 time(s)
    Word 'to' used 5 time(s)
    Word 'do' used 2 time(s)
    Word 'this' used 2 time(s)
    Word 'on' used 2 time(s)
    Word 'the' used 7 time(s)
    Word 'command' used 1 time(s)
    Word 'line' used 1 time(s)
    Word 'So' used 1 time(s)
    Word 'program' used 1 time(s)
    Word 'reads' used 1 time(s)
    Word 'and' used 5 time(s)
    ... (and so on)
    

    Then you do the job on the word tokens only. For example replacing one string with another:

    # change one word (and to AND)
    
    $words['and']['string'] = 'AND';
    

    Finally you concatenate the tokens into a single string:

    # output the whole text
    
    foreach($tokens as $token) echo $token->string;
    

    Which gives with the sample text again:

    I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to
     do this on the command line.
    
    So the program reads in a string, AND asks me what I want to replace the first word with, AND then the second word, AND 
    then the third word, AND so on, until all words have been processed.
    
    The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation AND
     spacing.
    
    Is there a proper way to do this?
    

    Job done. Ensure that word tokens are only replaced with valid word tokens, so tokenize the user-input as well and give errors if it's not a single word token (does not matches the word pattern).

    Code/Demo

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?