drn1008 2018-06-03 07:08
浏览 84
已采纳

Preg_Replace(删除)精确匹配单词PHP的数组

I have an array of stopwords set into an array

$stopwords = array(
    "a ",
    "about ",
    "above ",
    "above ",
    "across ",
    "after ",
    "afterwards ",
    "again ",
    "against ",
    "all ",
    "almost ",
    "alone ",
    "along ",
    "already ",
    "also ",
    "although ",
    "always ",
    "am ",
    "among ",
    "amongst ",
    "amoungst ",
    "amount ",
    "an ",
    "and ",
    "another ",
    "any ",
    "anyhow ",
    "anyone ",
    "anything ",
    "anyway ",
    "anywhere ",
    "are ",
    "around ",
    "as ",
    "at ",
    "back ",
    "be ",
    "became ",
    "because ",
    "become ",
    "becomes ",
    "becoming ",
    "been ",
    "before ",
    "beforehand ",
    "behind ",
    "being ",
    "below ",
    "beside ",
    "besides ",
    "between ",
    "beyond ",
    "bill ",
    "both ",
    "bottom ",
    "but ",
    "by ",
    "can ",
    "cannot ",
    "cant ",
    "co ",
    "con ",
    "could ",
    "couldnt ",
    "cry ",
    "considered ",
    "describe ",
    "detail ",
    "do ",
    "did ",
    "done ",
    "down ",
    "due ",
    "during ",
    "each ",
    "eg ",
    "eight ",
    "either ",
    "eleven ",
    "else ",
    "elsewhere ",
    "empty ",
    "enough ",
    "etc ",
    "even ",
    "ever ",
    "every ",
    "everyone ",
    "everything ",
    "everywhere ",
    "except ",
    "few ",
    "fifteen ",
    "fify ",
    "fill ",
    "find ",
    "fire ",
    "five ",
    "for ",
    "former ",
    "formerly ",
    "forty ",
    "found ",
    "four ",
    "from ",
    "front ",
    "full ",
    "further ",
    "get ",
    "give ",
    "go ",
    "had ",
//    "has ",
    "hasnt ",
    "have ",
    "he ",
    "hence ",
    "her ",
    "here ",
    "hereafter ",
    "hereby ",
    "herein ",
    "hereupon ",
    "hers ",
    "herself ",
    "him ",
    "himself ",
    "his ",
    "how ",
    "however ",
    "hundred ",
    "ie ",
    "if ",
    "In",
    "inc ",
    "indeed ",
    "interest ",
    "into ",
    "is ",
    "it ",
    "its ",
    "itself ",
    "keep ",
    "known ",
//    "last ",
    "latter ",
    "latterly ",
    "least ",
    "legend ",
    "less ",
    "ltd ",
//    "made ",
    "many ",
    "may ",
    "me ",
    "meanwhile ",
    "might ",
    "mill ",
    "mine ",
    "more ",
    "moreover ",
//    "most ",
    "mostly ",
    "move ",
    "much ",
    "must ",
    "my ",
    "myself ",
    "name ",
    "namely ",
    "neither ",
    "never ",
    "nevertheless ",
    "next ",
    "nine ",
    "no ",
    "nobody ",
    "none ",
    "noone ",
    "nor ",
    "nothing ",
    "now ",
    "nowhere ",
    "of ",
    "off ",
    "often ",
    "on ",
    "once ",
    "one ",
    "only ",
    "onto ",
    "or ",
    "other ",
    "others ",
    "otherwise ",
    "our ",
    "ours ",
    "ourselves ",
    "out ",
//    "over ",
    "own ",
    "part ",
    "per ",
    "perhaps ",
    "please ",
    "popular ",
    "put ",
    "rather ",
    "re ",
    "same ",
    "see ",
    "seem ",
    "seemed ",
    "seeming ",
    "seems ",
    "serious ",
    "several ",
    "she ",
    "should ",
    "show ",
    "since ",
    "sincere ",
    "six ",
    "sixty ",
    "so ",
    "some ",
    "somehow ",
    "someone ",
    "something ",
    "sometime ",
    "sometimes ",
    "somewhere ",
    "still ",
    "such ",
    "take ",
    "technique ",
    "ten ",
    "than ",
    "that ",
    "the ",
    "their ",
    "them ",
    "themselves ",
    "then ",
    "thence ",
    "there ",
    "thereafter ",
    "thereby ",
    "therefore ",
    "therein ",
    "thereupon ",
    "these ",
    "they ",
    "thickv ",
    "term ",
    "thin ",
    "third ",
    "this ",
    "those ",
    "though ",
    "three ",
    "through ",
    "throughout ",
    "thru ",
    "thus ",
    "to ",
    "together ",
    "too ",
    "top ",
    "toward ",
    "towards ",
    "twelve ",
    "twenty ",
    "two ",
    "un ",
    "under ",
    "until ",
    "up ",
    "upon ",
    "us ",
    "very ",
    "via ",
    "was ",
    "we ",
    "well ",
    "were ",
    "what ",
    "whatever ",
    "when ",
    "whence ",
    "whenever ",
    "where ",
    "whereafter ",
    "whereas ",
    "whereby ",
    "wherein ",
    "whereupon ",
    "wherever ",
    "whether ",
    "which ",
    "while ",
    "whither ",
    "who ",
    "whoever ",
    "whole ",
    "whom ",
    "whose ",
    "why ",
    "will ",
    "with ",
    "within ",
    "without ",
    "would ",
    "yet ",
    "you ",
    "your ",
    "yours ",
    "yourself ",
    "yourselves ",
    "the ",
    "likely ",
    "names "
);

You may have noticed by the space after I was trying to avoid cutting off strings and want to only replace whole matches (to a NULL value) from my stopword list.

Realizing that str_replace is probably secondary to capabilities and benefits, I turned my eye towards building a preg_replace array in attempt to regex whole words using word boundaries.

$pregreplacestopwords = array(
"/\ba\b/",
"/\babout\b/",
"/\babove\b/",
"/\babove\b/",
"/\bacross\b/",
"/\bafter\b/",
"/\bafterwards\b/",
"/\bagain\b/",
"/\bagainst\b/",
"/\ball\b/",
"/\balmost\b/",
"/\balone\b/",
"/\balong\b/",
"/\balready\b/",
"/\balso\b/",
"/\balthough\b/",
"/\balways\b/",
"/\bam\b/",
"/\bamong\b/",
"/\bamongst\b/",
"/\bamoungst\b/",
"/\bamount\b/",
"/\ban\b/",
"/\band\b/",
"/\banother\b/",
"/\bany\b/",
"/\banyhow\b/",
"/\banyone\b/",
"/\banything\b/",
"/\banyway\b/",
"/\banywhere\b/",
"/\bare\b/",
"/\baround\b/",
"/\bas\b/",
"/\bat\b/",
"/\bback\b/",
"/\bbe\b/",
"/\bbecame\b/",
"/\bbecause\b/",
"/\bbecome\b/",
"/\bbecomes\b/",
"/\bbecoming\b/",
"/\bbeen\b/",
"/\bbefore\b/",
"/\bbeforehand\b/",
"/\bbehind\b/",
"/\bbeing\b/",
"/\bbelow\b/",
"/\bbeside\b/",
"/\bbesides\b/",
"/\bbetween\b/",
"/\bbeyond\b/",
"/\bbill\b/",
"/\bboth\b/",
"/\bbottom\b/",
"/\bbut\b/",
"/\bby\b/",
"/\bcan\b/",
"/\bcannot\b/",
"/\bcant\b/",
"/\bco\b/",
"/\bcon\b/",
"/\bcould\b/",
"/\bcouldnt\b/",
"/\bcry\b/",
"/\bconsidered\b/",
"/\bdescribe\b/",
"/\bdetail\b/",
"/\bdo\b/",
"/\bdid\b/",
"/\bdone\b/",
"/\bdown\b/",
"/\bdue\b/",
"/\bduring\b/",
"/\beach\b/",
"/\beg\b/",
"/\beight\b/",
"/\beither\b/",
"/\beleven\b/",
"/\belse\b/",
"/\belsewhere\b/",
"/\bempty\b/",
"/\benough\b/",
"/\betc\b/",
"/\beven\b/",
"/\bever\b/",
"/\bevery\b/",
"/\beveryone\b/",
"/\beverything\b/",
"/\beverywhere\b/",
"/\bexcept\b/",
"/\bfew\b/",
"/\bfifteen\b/",
"/\bfify\b/",
"/\bfill\b/",
"/\bfind\b/",
"/\bfire\b/",
"/\bfive\b/",
"/\bfor\b/",
"/\bformer\b/",
"/\bformerly\b/",
"/\bforty\b/",
"/\bfound\b/",
"/\bfour\b/",
"/\bfrom\b/",
"/\bfront\b/",
"/\bfull\b/",
"/\bfurther\b/",
"/\bget\b/",
"/\bgive\b/",
"/\bgo\b/",
"/\bhad\b/",
"/\b//has\b/",
"/\bhasnt\b/",
"/\bhave\b/",
"/\bhe\b/",
"/\bhence\b/",
"/\bher\b/",
"/\bhere\b/",
"/\bhereafter\b/",
"/\bhereby\b/",
"/\bherein\b/",
"/\bhereupon\b/",
"/\bhers\b/",
"/\bherself\b/",
"/\bhim\b/",
"/\bhimself\b/",
"/\bhis\b/",
"/\bhow\b/",
"/\bhowever\b/",
"/\bhundred\b/",
"/\bie\b/",
"/\bif\b/",
"/\bIn\b/",
"/\binc\b/",
"/\bindeed\b/",
"/\binterest\b/",
"/\binto\b/",
"/\bis\b/",
"/\bit\b/",
"/\bits\b/",
"/\bitself\b/",
"/\bkeep\b/",
"/\bknown\b/",
"/\b//last\b/",
"/\blatter\b/",
"/\blatterly\b/",
"/\bleast\b/",
"/\blegend\b/",
"/\bless\b/",
"/\bltd\b/",
"/\b//made\b/",
"/\bmany\b/",
"/\bmay\b/",
"/\bme\b/",
"/\bmeanwhile\b/",
"/\bmight\b/",
"/\bmill\b/",
"/\bmine\b/",
"/\bmore\b/",
"/\bmoreover\b/",
"/\bmost\b/",
"/\bmostly\b/",
"/\bmove\b/",
"/\bmuch\b/",
"/\bmust\b/",
"/\bmy\b/",
"/\bmyself\b/",
"/\bname\b/",
"/\bnamely\b/",
"/\bneither\b/",
"/\bnever\b/",
"/\bnevertheless\b/",
"/\bnext\b/",
"/\bnine\b/",
"/\bno\b/",
"/\bnobody\b/",
"/\bnone\b/",
"/\bnoone\b/",
"/\bnor\b/",
"/\bnothing\b/",
"/\bnow\b/",
"/\bnowhere\b/",
"/\bof\b/",
"/\boff\b/",
"/\boften\b/",
"/\bon\b/",
"/\bonce\b/",
"/\bone\b/",
"/\bonly\b/",
"/\bonto\b/",
"/\bor\b/",
"/\bother\b/",
"/\bothers\b/",
"/\botherwise\b/",
"/\bour\b/",
"/\bours\b/",
"/\bourselves\b/",
"/\bout\b/",
"/\b//over\b/",
"/\bown\b/",
"/\bpart\b/",
"/\bper\b/",
"/\bperhaps\b/",
"/\bplease\b/",
"/\bpopular\b/",
"/\bput\b/",
"/\brather\b/",
"/\bre\b/",
"/\bsame\b/",
"/\bsee\b/",
"/\bseem\b/",
"/\bseemed\b/",
"/\bseeming\b/",
"/\bseems\b/",
"/\bserious\b/",
"/\bseveral\b/",
"/\bshe\b/",
"/\bshould\b/",
"/\bshow\b/",
"/\bsince\b/",
"/\bsincere\b/",
"/\bsix\b/",
"/\bsixty\b/",
"/\bso\b/",
"/\bsome\b/",
"/\bsomehow\b/",
"/\bsomeone\b/",
"/\bsomething\b/",
"/\bsometime\b/",
"/\bsometimes\b/",
"/\bsomewhere\b/",
"/\bstill\b/",
"/\bsuch\b/",
"/\btake\b/",
"/\btechnique\b/",
"/\bten\b/",
"/\bthan\b/",
"/\bthat\b/",
"/\bthe\b/",
"/\btheir\b/",
"/\bthem\b/",
"/\bthemselves\b/",
"/\bthen\b/",
"/\bthence\b/",
"/\bthere\b/",
"/\bthereafter\b/",
"/\bthereby\b/",
"/\btherefore\b/",
"/\btherein\b/",
"/\bthereupon\b/",
"/\bthese\b/",
"/\bthey\b/",
"/\bthickv\b/",
"/\bterm\b/",
"/\bthin\b/",
"/\bthird\b/",
"/\bthis\b/",
"/\bthose\b/",
"/\bthough\b/",
"/\bthree\b/",
"/\bthrough\b/",
"/\bthroughout\b/",
"/\bthru\b/",
"/\bthus\b/",
"/\bto\b/",
"/\btogether\b/",
"/\btoo\b/",
"/\btop\b/",
"/\btoward\b/",
"/\btowards\b/",
"/\btwelve\b/",
"/\btwenty\b/",
"/\btwo\b/",
"/\bun\b/",
"/\bunder\b/",
"/\buntil\b/",
"/\bup\b/",
"/\bupon\b/",
"/\bus\b/",
"/\bvery\b/",
"/\bvia\b/",
"/\bwas\b/",
"/\bwe\b/",
"/\bwell\b/",
"/\bwere\b/",
"/\bwhat\b/",
"/\bwhatever\b/",
"/\bwhen\b/",
"/\bwhence\b/",
"/\bwhenever\b/",
"/\bwhere\b/",
"/\bwhereafter\b/",
"/\bwhereas\b/",
"/\bwhereby\b/",
"/\bwherein\b/",
"/\bwhereupon\b/",
"/\bwherever\b/",
"/\bwhether\b/",
"/\bwhich\b/",
"/\bwhile\b/",
"/\bwhither\b/",
"/\bwho\b/",
"/\bwhoever\b/",
"/\bwhole\b/",
"/\bwhom\b/",
"/\bwhose\b/",
"/\bwhy\b/",
"/\bwill\b/",
"/\bwith\b/",
"/\bwithin\b/",
"/\bwithout\b/",
"/\bwould\b/",
"/\byet\b/",
"/\byou\b/",
"/\byour\b/",
"/\byours\b/",
"/\byourself\b/",
"/\byourselves\b/",
"/\bthe\b/",
"/\blikely\b/",
"/\bnames\b/"
        );

Created a blank array for it:

$pgreplace = array(" "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," ");

Let's take the word “B.A.” for example and put it into a string variable, make it a sentence for fun.

 $string = 'I got my “B.A.” from...';

Some methods I've tried have been things such as imploding the stop words,

Attempting things such as

preg_replace($pregreplacestopwords, $pregreplacestopwords, $string);

Just get's filled with errors

Warning: preg_replace(): Compilation failed: missing terminating ] for character class at offset 1951 in C:\wamp64\www\pg\test.php on line 664

Warning: preg_replace(): Empty regular expression in C:\wamp64\www\pg\test.php on line 666
NULL 
Warning: preg_replace(): Unknown modifier '/' in C:\wamp64\www\pg\test.php on line 670
NULL

Imploding the array, via$implodestopwords = implode("|", array_map("trim",array_filter($stopwords)));

a|about|above|above|across|after|afterwards|again|against|all|almost|alone|along|already|also

and so forth.

Trying to put this in action

$pattern = '/\b(' . $implodestopwords . ')\b/i';
$string = preg_replace($pattern, "", $string);

var_dump($string);

outputs:

I got “B..” ...

How can I modify my preg_replace to only match exact words and remove them from a large list of words from an array?

Full script here: https://pastebin.com/vwbNjhs9

  • 写回答

1条回答 默认 最新

  • dongtuo8170 2018-06-03 07:58
    关注

    Maybe instead of using preg_replace() you might just try turning your string into an array and then looping over it checking if each word is in your stop words array.

    Try this and see if it works:

    $string = 'I got my "B.A." from...';
    $string = preg_replace('/\s{1,}/', ' ', $string); //<--insure only one space between characters.
    $array = explode(' ', $string);
    
    for($i = 0; $i < count($array); $i++){
    
      if(in_array($array[$i] . ' ', $stopwords)){ //<-- Only concatenated space because of your
      //trailing spaces in the stopwords array.
    
        $array[$i] = '';  //<--Removed the word.
    
      }
    
    }
    
    $newString = implode(' ', $array);  //<--Turn the array back to a string.
    
    echo $newString; //<---Outputs "I got "B.A." from...".
    

    This method gives you a lot of control over what you decide to do with each found word.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 eclipse开启服务后,网页无法打开
  • ¥30 雷达辐射源信号参考模型
  • ¥15 html+css+js如何实现这样子的效果?
  • ¥15 STM32单片机自主设计
  • ¥15 如何在node.js中或者java中给wav格式的音频编码成sil格式呢
  • ¥15 不小心不正规的开发公司导致不给我们y码,
  • ¥15 我的代码无法在vc++中运行呀,错误很多
  • ¥50 求一个win系统下运行的可自动抓取arm64架构deb安装包和其依赖包的软件。
  • ¥60 fail to initialize keyboard hotkeys through kernel.0000000000
  • ¥30 ppOCRLabel导出识别结果失败