douzhan1031 2015-07-27 15:03
浏览 33
已采纳

如何突出显示网页上的搜索匹配文本

I'm trying to write a PHP function that takes some text to be displayed on a web page, and then based on some entered search terms, highlights the corresponding parts of the text. Unfortunately, I'm having a couple of issues.
To better explain the two issues I'm having, let's imagine that the following innocuous string is being searched on and will be displayed on the web page:

My daughter was born on January 11, 2011.

My first problem is that if more than one search term is entered, any placeholder text I use to mark the start and end of any matches for the first term may then be matched by the second term.
For example, I'm currently using the following delimiting strings to mark the beginning and end of a match (upon which I use the preg_replace function at the end of the function to turn the delimiters into HTML span tags):

'#####highlightStart#####'
'#####highlightEnd#####'

The problem is, if I do a search like 2011 light, then 2011 will be matched first, giving me:

My daughter was born on January 11, #####highlightStart#####2011#####highlightEnd#####.

Upon which when light is searched for, it will match the word light within both #####highlightStart##### and #####highlightEnd#####, which I don't want.

One thought I had was to create some really obscure delimiting strings (in perhaps a foreign language) that would likely never be searched on, but I can't guarantee that any particular string will never be searched on and it just seems like a really kludgy solution. Basically, I imagine that there is a better way to do it.
Any advice on this first point would be greatly appreciated.

My second question has to do with how to handle overlapping matches.
For example, with the same string My daughter was born on January 11, 2011., if the entered search is Jan anuar, then Jan will be matched first, giving me:

My daughter was born on #####highlightStart#####Jan#####highlightEnd#####uary 11, 2011.

And because the delimiting text is now a part of the string, the second search term, anuar will never be matched.

Regarding this issue, I am quite perplexed and really have no clue how to solve it.
I feel like I need to somehow do all of the search operations on the original string separately and then somehow combine them at the end, but again, I'm lost on how to do this.
Perhaps there's a way better solution altogether, but I don't know what that would be.

Any advice or direction on how to solve either or both of these problems would be greatly appreciated.
Thank you.

  • 写回答

2条回答 默认 最新

  • dtjzpg5313 2015-07-27 15:39
    关注

    In this case I think it's simpler to use str_replace (though it won't be perfect).

    Assuming you've got an array of terms you want to highlight, I'll call it $aSearchTerms for the sake of argument... and that wrapping the highlighted terms in the HTML5 <mark> tag is acceptable (for the sake of legibility, you've stated it's going on a web-page and it's easy to strip_tags() from your search terms):

    $aSearchTerms = ['Jan', 'anu', 'Feb', '11'];
    $sinContent = "My daughter was born on January 11, 2011.";
    
    foreach($aSearchTerms as $sinTerm) {
        $sinContent = str_replace($sinTerm, "<mark>{$sinTerm}</mark>", $sinContent);
    }
    
    echo $sinContent;
    // outputs: My d<mark>au</mark>ghter was born on <mark>Jan</mark>uary <mark>11</mark>, 20<mark>11</mark>.
    

    It's not perfect since, using the data in that array, the first pass will change January to <mark>Jan</mark>uary which means anu will no longer match in January - something like this will, however, cover most usage needs.


    EDIT

    Oki - I'm not 100% certain this is sane but I took a totally different approach looking at the link @AlexAtNet posted:

    https://stackoverflow.com/a/3631016/886824

    What I've done is looked at the points in the string where the search term is found numerically (the indexes) and built an array of start and end indexes where the <mark> and </mark> tags are going to be entered.

    Then using the answer above merged those start and end indexes together - this covers your overlapping matches issue.

    Then I've looped that array and cut the original string up into substrings and glued it back together inserting the <mark> and </mark> tags at the relevant points (based on the indexes). This should cover your second issue so you're not having string replacements replacing string replacements.

    The code in full looks like:

    <?php
    $sContent = "Captain's log, January 11, 2711 - Uranus";
    $ainSearchTerms = array('Jan', 'asduih', 'anu', '11');
    
    //lower-case it for substr_count
    $sContentForSearching = strtolower($sContent);
    
    //array of first and last positions of the terms within the string
    $aTermPositions = array();
    
    //loop through your search terms and build a multi-dimensional array
    //of start and end indexes for each term
    foreach($ainSearchTerms as $sinTerm) {
    
      //lower-case the search term
      $sinTermLower = strtolower($sinTerm);
    
      $iTermPosition = 0;
      $iTermLength = strlen($sinTermLower);
      $iTermOccursCount = substr_count($sContentForSearching, $sinTermLower);
    
      for($i=0; $i<$iTermOccursCount; $i++) {
    
        //find the start and end positions for this term
        $iStartIndex = strpos($sContentForSearching, $sinTermLower, $iTermPosition);
        $iEndIndex = $iStartIndex + $iTermLength;
        $aTermPositions[] = array($iStartIndex, $iEndIndex);
    
        //update the term position
        $iTermPosition = $iEndIndex + $i;
      }
    }
    
    //taken directly from this answer https://stackoverflow.com/a/3631016/886824
    //just replaced $data with $aTermPositions
    //this sorts out the overlaps so that 'Jan' and 'anu' will merge into 'Janu'
    //in January - whilst still matching 'anu' in Uranus
    //
    //This conveniently sorts all your start and end indexes in ascending order
    usort($aTermPositions, function($a, $b)
    {
            return $a[0] - $b[0];
    });
    
    $n = 0; $len = count($aTermPositions);
    for ($i = 1; $i < $len; ++$i)
    {
            if ($aTermPositions[$i][0] > $aTermPositions[$n][1] + 1)
                    $n = $i;
            else
            {
                    if ($aTermPositions[$n][1] < $aTermPositions[$i][1])
                            $aTermPositions[$n][1] = $aTermPositions[$i][1];
                    unset($aTermPositions[$i]);
            }
    }
    
    $aTermPositions = array_values($aTermPositions);
    
    //finally chop your original string into the bits
    //where you want to insert <mark> and </mark>
    if($aTermPositions) {
        $iLastContentChunkIndex = 0;
        $soutContent = "";
    
        foreach($aTermPositions as $aChunkIndex) {
            $soutContent .= substr($sContent, $iLastContentChunkIndex, $aChunkIndex[0] - $iLastContentChunkIndex)
                . "<mark>" . substr($sContent, $aChunkIndex[0], $aChunkIndex[1] - $aChunkIndex[0]) . "</mark>";
    
            $iLastContentChunkIndex = $aChunkIndex[1];
        }
    
        //... and the bit on the end
        $soutContent .= substr($sContent, $iLastContentChunkIndex);
    }
    
    //this *should* output the following:
    //Captain's log, <mark>Janu</mark>ary <mark>11</mark>, 27<mark>11</mark> - Ur<mark>anu</mark>s
    echo $soutContent;
    

    The inevitable gotcha! Using this on content that's already HTML may fail horribly.

    Given the string.

    In <a href="#">January</a> this year...

    A search/mark of Jan will insert the <mark>/</mark> around 'Jan' which is fine. However a search mark of something like In Jan is going to fail as there's markup in the way :\

    Can't think of a good way around that I'm afraid.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料