duannaoye0732 2008-09-17 04:24 采纳率: 100%
浏览 54
已采纳

如何将PHP中的字符串截断为最接近一定数量字符的单词?

I have a code snippet written in PHP that pulls a block of text from a database and sends it out to a widget on a webpage. The original block of text can be a lengthy article or a short sentence or two; but for this widget I can't display more than, say, 200 characters. I could use substr() to chop off the text at 200 chars, but the result would be cutting off in the middle of words-- what I really want is to chop the text at the end of the last word before 200 chars.

  • 写回答

26条回答 默认 最新

  • drpph80800 2008-09-17 04:27
    关注

    By using the wordwrap function. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

    substr($string, 0, strpos(wordwrap($string, $your_desired_width), "
    "));
    

    One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

    if (strlen($string) > $your_desired_width) 
    {
        $string = wordwrap($string, $your_desired_width);
        $string = substr($string, 0, strpos($string, "
    "));
    }
    

    The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

    function tokenTruncate($string, $your_desired_width) {
      $parts = preg_split('/([\s
    ]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
      $parts_count = count($parts);
    
      $length = 0;
      $last_part = 0;
      for (; $last_part < $parts_count; ++$last_part) {
        $length += strlen($parts[$last_part]);
        if ($length > $your_desired_width) { break; }
      }
    
      return implode(array_slice($parts, 0, $last_part));
    }
    

    Also, here is the PHPUnit testclass used to test the implementation:

    class TokenTruncateTest extends PHPUnit_Framework_TestCase {
      public function testBasic() {
        $this->assertEquals("1 3 5 7 9 ",
          tokenTruncate("1 3 5 7 9 11 14", 10));
      }
    
      public function testEmptyString() {
        $this->assertEquals("",
          tokenTruncate("", 10));
      }
    
      public function testShortString() {
        $this->assertEquals("1 3",
          tokenTruncate("1 3", 10));
      }
    
      public function testStringTooLong() {
        $this->assertEquals("",
          tokenTruncate("toooooooooooolooooong", 10));
      }
    
      public function testContainingNewline() {
        $this->assertEquals("1 3
    5 7 9 ",
          tokenTruncate("1 3
    5 7 9 11 14", 10));
      }
    }
    

    EDIT :

    Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

    $parts = preg_split('/([\s ]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(25条)

报告相同问题?

悬赏问题

  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?
  • ¥100 求三轴之间相互配合画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败