dongqiao1151
2011-10-31 09:59 阅读 33
已采纳

在文本中搜索关键字之前截断内容

I am using the below code to truncate my content before and after the first search keyword in my text (this is for my search page) everything works as it should apart from the code cutting words in half at the beginning of the truncate, it doesn't cut words at the end of the truncate.

Example:

lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...

(edit:it is sometimes more than one character missing from the word)

You will see that it has chopped the 'c' off 'clients'. It only happens at the beginning of the text not the end. How can I fix this? I believe I am half way there. code so far:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
            if (strlen($content) > $chars) {
                 $pos = strpos($content, $searchquery);
                 $start = $characters_before < $pos ? $pos - $characters_before : 0;
                $len = $pos + strlen($searchquery) + $characters_after - $start;
                $content = str_replace('&nbsp;', ' ', $content);
                $content = str_replace("
", '', $content);
                $content = strip_tags(trim($content));
                $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
                $content = trim($content) . '...';
                $content = strip_tags($content);
                $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
            }
            return $content;
        }



 $results[] = Array(
  'text' => neatest_trim($row->content,200,$searchquery,120,80)
            );
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

2条回答 默认 最新

  • 已采纳
    doulang6013 doulang6013 2011-10-31 10:21

    The 120 Characters that you are keeping at the start don't check if the 120th character is a space or a letter, and just cuts the string there no matter what.

    I would make this change, to search for the closest "space" to the position we are starting from.

    $start = $characters_before < $pos ? $pos - $characters_before : 0;
    // add this line:
    $start = strpos($content, ' ', $start);
    $len = $pos + strlen($searchquery) + $characters_after - $start;
    

    This way $start is the position of a space, and not a letter from a word.

    Your Function would become:

    function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
        if (strlen($content) > $chars) {
        $pos = strpos($content, $searchquery);
        $start = $characters_before < $pos ? $pos - $characters_before : 0;
        $start = strpos($content, " ", $start);
        $len = $pos + strlen($searchquery) + $characters_after - $start;
        $content = str_replace('&nbsp;', ' ', $content);
        $content = str_replace("
    ", '', $content);
        $content = strip_tags(trim($content));
        $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
        $content = trim($content) . '...';
        $content = strip_tags($content);
        $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
        }
        return $content;
      }
    
    点赞 评论 复制链接分享
  • dsdt66064367 dsdt66064367 2011-10-31 10:21

    Why just don't use a replace regex ?

    $result = preg_replace('/.*(.{10}\bword\b.{10}).*/s', '$1', $subject);
    

    So this will trim everything 10 chars before and after the keyword 'word'

    Explanation :

    # .*(.{10}\bword\b.{10}).*
    # 
    # Options: dot matches newline
    # 
    # Match any single character «.*»
    #    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    # Match the regular expression below and capture its match into backreference number 1 «(.{10}\bword\b.{10})»
    #    Match any single character «.{10}»
    #       Exactly 10 times «{10}»
    #    Assert position at a word boundary «\b»
    #    Match the characters “word” literally «word»
    #    Assert position at a word boundary «\b»
    #    Match any single character «.{10}»
    #       Exactly 10 times «{10}»
    # Match any single character «.*»
    #    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    

    So what this regex does is finding the word that you specify (and only that word alone because it is included in \b - word boundaries) and it also find ant stores (including the word) the 10 characters before the word as well as the ten characters after it. You could construct the regex yourself with variables for characters before-after and of course the keyword. The regex also matches everything else but the replacement only uses backreference $1 which is what you want as the output.

    点赞 评论 复制链接分享

相关推荐