dongluo3962 2010-05-02 08:50
浏览 65
已采纳

PHP停止单词列表

I'm playing about with a stop words within my code I have an array full of words that I'd like to check, and an array of words I want to check against.

At the moment I'm looping through the array one at at a time and removing the word if its in_array vs the stop word list but I wonder if there's a better way of doing it, I've looked at array_diff and such however if I have multiple stop words in the first array, array_diff only appears to remove the first occurrence.

The focus is on speed and memory usage but speed more so.

Edit -

The first array is singular words, based on blog comments (these are usually quite long) the second array is singular words of stop words. Sorry for not making that clear

Thanks

  • 写回答

4条回答 默认 最新

  • doucao1888 2010-05-02 08:57
    关注

    Using str_replace...

    A simple approach is to use str_replace or str_ireplace, which can take an array of 'needles' (things to search for), corresponding replacements, and an array of 'haystacks' (things to operate on).

    $haystacks=array(
      "The quick brown fox",
      "jumps over the ",
      "lazy dog"
    );
    
    $needles=array(
      "the", "lazy", "quick"
    );
    
    $result=str_ireplace($needles, "", $haystacks);
    
    var_dump($result);
    

    This produces

    array(3) {
      [0]=>
      string(11) "  brown fox"
      [1]=>
      string(12) "jumps over  "
      [2]=>
      string(4) " dog"
    }
    

    As an aside, a quick way to clean up the trailing spaces this leaves would be to use array_map to call trim for each element

    $result=array_map("trim", $result);
    

    The drawback of using str_replace is that it will replace matches found within words, rather than just whole words. To address that, we can use regular expressions...

    Use preg_replace

    An approach using preg_replace looks very similar to the above, but the needles are regular expressions, and we check for a 'word boundary' at the start and end of the match using \b

    $haystacks=array(
    "For we shall use fortran to",
    "fortify the general theme",
    "of this torrent of nonsense"
    );
    
    $needles=array(
      '/\bfor\b/i', 
      '/\bthe\b/i', 
      '/\bto\b/i', 
      '/\bof\b/i'
    );
    
    $result=preg_replace($needles, "", $haystacks);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入
  • ¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
  • ¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
  • ¥15 帮我写一个c++工程
  • ¥30 Eclipse官网打不开,官网首页进不去,显示无法访问此页面,求解决方法
  • ¥15 关于smbclient 库的使用
  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害