drn1008 2018-06-03 07:08
浏览 85
已采纳

Preg_Replace(删除)精确匹配单词PHP的数组

I have an array of stopwords set into an array

$stopwords = array(
    "a ",
    "about ",
    "above ",
    "above ",
    "across ",
    "after ",
    "afterwards ",
    "again ",
    "against ",
    "all ",
    "almost ",
    "alone ",
    "along ",
    "already ",
    "also ",
    "although ",
    "always ",
    "am ",
    "among ",
    "amongst ",
    "amoungst ",
    "amount ",
    "an ",
    "and ",
    "another ",
    "any ",
    "anyhow ",
    "anyone ",
    "anything ",
    "anyway ",
    "anywhere ",
    "are ",
    "around ",
    "as ",
    "at ",
    "back ",
    "be ",
    "became ",
    "because ",
    "become ",
    "becomes ",
    "becoming ",
    "been ",
    "before ",
    "beforehand ",
    "behind ",
    "being ",
    "below ",
    "beside ",
    "besides ",
    "between ",
    "beyond ",
    "bill ",
    "both ",
    "bottom ",
    "but ",
    "by ",
    "can ",
    "cannot ",
    "cant ",
    "co ",
    "con ",
    "could ",
    "couldnt ",
    "cry ",
    "considered ",
    "describe ",
    "detail ",
    "do ",
    "did ",
    "done ",
    "down ",
    "due ",
    "during ",
    "each ",
    "eg ",
    "eight ",
    "either ",
    "eleven ",
    "else ",
    "elsewhere ",
    "empty ",
    "enough ",
    "etc ",
    "even ",
    "ever ",
    "every ",
    "everyone ",
    "everything ",
    "everywhere ",
    "except ",
    "few ",
    "fifteen ",
    "fify ",
    "fill ",
    "find ",
    "fire ",
    "five ",
    "for ",
    "former ",
    "formerly ",
    "forty ",
    "found ",
    "four ",
    "from ",
    "front ",
    "full ",
    "further ",
    "get ",
    "give ",
    "go ",
    "had ",
//    "has ",
    "hasnt ",
    "have ",
    "he ",
    "hence ",
    "her ",
    "here ",
    "hereafter ",
    "hereby ",
    "herein ",
    "hereupon ",
    "hers ",
    "herself ",
    "him ",
    "himself ",
    "his ",
    "how ",
    "however ",
    "hundred ",
    "ie ",
    "if ",
    "In",
    "inc ",
    "indeed ",
    "interest ",
    "into ",
    "is ",
    "it ",
    "its ",
    "itself ",
    "keep ",
    "known ",
//    "last ",
    "latter ",
    "latterly ",
    "least ",
    "legend ",
    "less ",
    "ltd ",
//    "made ",
    "many ",
    "may ",
    "me ",
    "meanwhile ",
    "might ",
    "mill ",
    "mine ",
    "more ",
    "moreover ",
//    "most ",
    "mostly ",
    "move ",
    "much ",
    "must ",
    "my ",
    "myself ",
    "name ",
    "namely ",
    "neither ",
    "never ",
    "nevertheless ",
    "next ",
    "nine ",
    "no ",
    "nobody ",
    "none ",
    "noone ",
    "nor ",
    "nothing ",
    "now ",
    "nowhere ",
    "of ",
    "off ",
    "often ",
    "on ",
    "once ",
    "one ",
    "only ",
    "onto ",
    "or ",
    "other ",
    "others ",
    "otherwise ",
    "our ",
    "ours ",
    "ourselves ",
    "out ",
//    "over ",
    "own ",
    "part ",
    "per ",
    "perhaps ",
    "please ",
    "popular ",
    "put ",
    "rather ",
    "re ",
    "same ",
    "see ",
    "seem ",
    "seemed ",
    "seeming ",
    "seems ",
    "serious ",
    "several ",
    "she ",
    "should ",
    "show ",
    "since ",
    "sincere ",
    "six ",
    "sixty ",
    "so ",
    "some ",
    "somehow ",
    "someone ",
    "something ",
    "sometime ",
    "sometimes ",
    "somewhere ",
    "still ",
    "such ",
    "take ",
    "technique ",
    "ten ",
    "than ",
    "that ",
    "the ",
    "their ",
    "them ",
    "themselves ",
    "then ",
    "thence ",
    "there ",
    "thereafter ",
    "thereby ",
    "therefore ",
    "therein ",
    "thereupon ",
    "these ",
    "they ",
    "thickv ",
    "term ",
    "thin ",
    "third ",
    "this ",
    "those ",
    "though ",
    "three ",
    "through ",
    "throughout ",
    "thru ",
    "thus ",
    "to ",
    "together ",
    "too ",
    "top ",
    "toward ",
    "towards ",
    "twelve ",
    "twenty ",
    "two ",
    "un ",
    "under ",
    "until ",
    "up ",
    "upon ",
    "us ",
    "very ",
    "via ",
    "was ",
    "we ",
    "well ",
    "were ",
    "what ",
    "whatever ",
    "when ",
    "whence ",
    "whenever ",
    "where ",
    "whereafter ",
    "whereas ",
    "whereby ",
    "wherein ",
    "whereupon ",
    "wherever ",
    "whether ",
    "which ",
    "while ",
    "whither ",
    "who ",
    "whoever ",
    "whole ",
    "whom ",
    "whose ",
    "why ",
    "will ",
    "with ",
    "within ",
    "without ",
    "would ",
    "yet ",
    "you ",
    "your ",
    "yours ",
    "yourself ",
    "yourselves ",
    "the ",
    "likely ",
    "names "
);

You may have noticed by the space after I was trying to avoid cutting off strings and want to only replace whole matches (to a NULL value) from my stopword list.

Realizing that str_replace is probably secondary to capabilities and benefits, I turned my eye towards building a preg_replace array in attempt to regex whole words using word boundaries.

$pregreplacestopwords = array(
"/\ba\b/",
"/\babout\b/",
"/\babove\b/",
"/\babove\b/",
"/\bacross\b/",
"/\bafter\b/",
"/\bafterwards\b/",
"/\bagain\b/",
"/\bagainst\b/",
"/\ball\b/",
"/\balmost\b/",
"/\balone\b/",
"/\balong\b/",
"/\balready\b/",
"/\balso\b/",
"/\balthough\b/",
"/\balways\b/",
"/\bam\b/",
"/\bamong\b/",
"/\bamongst\b/",
"/\bamoungst\b/",
"/\bamount\b/",
"/\ban\b/",
"/\band\b/",
"/\banother\b/",
"/\bany\b/",
"/\banyhow\b/",
"/\banyone\b/",
"/\banything\b/",
"/\banyway\b/",
"/\banywhere\b/",
"/\bare\b/",
"/\baround\b/",
"/\bas\b/",
"/\bat\b/",
"/\bback\b/",
"/\bbe\b/",
"/\bbecame\b/",
"/\bbecause\b/",
"/\bbecome\b/",
"/\bbecomes\b/",
"/\bbecoming\b/",
"/\bbeen\b/",
"/\bbefore\b/",
"/\bbeforehand\b/",
"/\bbehind\b/",
"/\bbeing\b/",
"/\bbelow\b/",
"/\bbeside\b/",
"/\bbesides\b/",
"/\bbetween\b/",
"/\bbeyond\b/",
"/\bbill\b/",
"/\bboth\b/",
"/\bbottom\b/",
"/\bbut\b/",
"/\bby\b/",
"/\bcan\b/",
"/\bcannot\b/",
"/\bcant\b/",
"/\bco\b/",
"/\bcon\b/",
"/\bcould\b/",
"/\bcouldnt\b/",
"/\bcry\b/",
"/\bconsidered\b/",
"/\bdescribe\b/",
"/\bdetail\b/",
"/\bdo\b/",
"/\bdid\b/",
"/\bdone\b/",
"/\bdown\b/",
"/\bdue\b/",
"/\bduring\b/",
"/\beach\b/",
"/\beg\b/",
"/\beight\b/",
"/\beither\b/",
"/\beleven\b/",
"/\belse\b/",
"/\belsewhere\b/",
"/\bempty\b/",
"/\benough\b/",
"/\betc\b/",
"/\beven\b/",
"/\bever\b/",
"/\bevery\b/",
"/\beveryone\b/",
"/\beverything\b/",
"/\beverywhere\b/",
"/\bexcept\b/",
"/\bfew\b/",
"/\bfifteen\b/",
"/\bfify\b/",
"/\bfill\b/",
"/\bfind\b/",
"/\bfire\b/",
"/\bfive\b/",
"/\bfor\b/",
"/\bformer\b/",
"/\bformerly\b/",
"/\bforty\b/",
"/\bfound\b/",
"/\bfour\b/",
"/\bfrom\b/",
"/\bfront\b/",
"/\bfull\b/",
"/\bfurther\b/",
"/\bget\b/",
"/\bgive\b/",
"/\bgo\b/",
"/\bhad\b/",
"/\b//has\b/",
"/\bhasnt\b/",
"/\bhave\b/",
"/\bhe\b/",
"/\bhence\b/",
"/\bher\b/",
"/\bhere\b/",
"/\bhereafter\b/",
"/\bhereby\b/",
"/\bherein\b/",
"/\bhereupon\b/",
"/\bhers\b/",
"/\bherself\b/",
"/\bhim\b/",
"/\bhimself\b/",
"/\bhis\b/",
"/\bhow\b/",
"/\bhowever\b/",
"/\bhundred\b/",
"/\bie\b/",
"/\bif\b/",
"/\bIn\b/",
"/\binc\b/",
"/\bindeed\b/",
"/\binterest\b/",
"/\binto\b/",
"/\bis\b/",
"/\bit\b/",
"/\bits\b/",
"/\bitself\b/",
"/\bkeep\b/",
"/\bknown\b/",
"/\b//last\b/",
"/\blatter\b/",
"/\blatterly\b/",
"/\bleast\b/",
"/\blegend\b/",
"/\bless\b/",
"/\bltd\b/",
"/\b//made\b/",
"/\bmany\b/",
"/\bmay\b/",
"/\bme\b/",
"/\bmeanwhile\b/",
"/\bmight\b/",
"/\bmill\b/",
"/\bmine\b/",
"/\bmore\b/",
"/\bmoreover\b/",
"/\bmost\b/",
"/\bmostly\b/",
"/\bmove\b/",
"/\bmuch\b/",
"/\bmust\b/",
"/\bmy\b/",
"/\bmyself\b/",
"/\bname\b/",
"/\bnamely\b/",
"/\bneither\b/",
"/\bnever\b/",
"/\bnevertheless\b/",
"/\bnext\b/",
"/\bnine\b/",
"/\bno\b/",
"/\bnobody\b/",
"/\bnone\b/",
"/\bnoone\b/",
"/\bnor\b/",
"/\bnothing\b/",
"/\bnow\b/",
"/\bnowhere\b/",
"/\bof\b/",
"/\boff\b/",
"/\boften\b/",
"/\bon\b/",
"/\bonce\b/",
"/\bone\b/",
"/\bonly\b/",
"/\bonto\b/",
"/\bor\b/",
"/\bother\b/",
"/\bothers\b/",
"/\botherwise\b/",
"/\bour\b/",
"/\bours\b/",
"/\bourselves\b/",
"/\bout\b/",
"/\b//over\b/",
"/\bown\b/",
"/\bpart\b/",
"/\bper\b/",
"/\bperhaps\b/",
"/\bplease\b/",
"/\bpopular\b/",
"/\bput\b/",
"/\brather\b/",
"/\bre\b/",
"/\bsame\b/",
"/\bsee\b/",
"/\bseem\b/",
"/\bseemed\b/",
"/\bseeming\b/",
"/\bseems\b/",
"/\bserious\b/",
"/\bseveral\b/",
"/\bshe\b/",
"/\bshould\b/",
"/\bshow\b/",
"/\bsince\b/",
"/\bsincere\b/",
"/\bsix\b/",
"/\bsixty\b/",
"/\bso\b/",
"/\bsome\b/",
"/\bsomehow\b/",
"/\bsomeone\b/",
"/\bsomething\b/",
"/\bsometime\b/",
"/\bsometimes\b/",
"/\bsomewhere\b/",
"/\bstill\b/",
"/\bsuch\b/",
"/\btake\b/",
"/\btechnique\b/",
"/\bten\b/",
"/\bthan\b/",
"/\bthat\b/",
"/\bthe\b/",
"/\btheir\b/",
"/\bthem\b/",
"/\bthemselves\b/",
"/\bthen\b/",
"/\bthence\b/",
"/\bthere\b/",
"/\bthereafter\b/",
"/\bthereby\b/",
"/\btherefore\b/",
"/\btherein\b/",
"/\bthereupon\b/",
"/\bthese\b/",
"/\bthey\b/",
"/\bthickv\b/",
"/\bterm\b/",
"/\bthin\b/",
"/\bthird\b/",
"/\bthis\b/",
"/\bthose\b/",
"/\bthough\b/",
"/\bthree\b/",
"/\bthrough\b/",
"/\bthroughout\b/",
"/\bthru\b/",
"/\bthus\b/",
"/\bto\b/",
"/\btogether\b/",
"/\btoo\b/",
"/\btop\b/",
"/\btoward\b/",
"/\btowards\b/",
"/\btwelve\b/",
"/\btwenty\b/",
"/\btwo\b/",
"/\bun\b/",
"/\bunder\b/",
"/\buntil\b/",
"/\bup\b/",
"/\bupon\b/",
"/\bus\b/",
"/\bvery\b/",
"/\bvia\b/",
"/\bwas\b/",
"/\bwe\b/",
"/\bwell\b/",
"/\bwere\b/",
"/\bwhat\b/",
"/\bwhatever\b/",
"/\bwhen\b/",
"/\bwhence\b/",
"/\bwhenever\b/",
"/\bwhere\b/",
"/\bwhereafter\b/",
"/\bwhereas\b/",
"/\bwhereby\b/",
"/\bwherein\b/",
"/\bwhereupon\b/",
"/\bwherever\b/",
"/\bwhether\b/",
"/\bwhich\b/",
"/\bwhile\b/",
"/\bwhither\b/",
"/\bwho\b/",
"/\bwhoever\b/",
"/\bwhole\b/",
"/\bwhom\b/",
"/\bwhose\b/",
"/\bwhy\b/",
"/\bwill\b/",
"/\bwith\b/",
"/\bwithin\b/",
"/\bwithout\b/",
"/\bwould\b/",
"/\byet\b/",
"/\byou\b/",
"/\byour\b/",
"/\byours\b/",
"/\byourself\b/",
"/\byourselves\b/",
"/\bthe\b/",
"/\blikely\b/",
"/\bnames\b/"
        );

Created a blank array for it:

$pgreplace = array

Let's take the word “B.A.” for example and put it into a string variable, make it a sentence for fun.

 $string = 'I got my “B.A.” from...';

Some methods I've tried have been things such as imploding the stop words,

Attempting things such as

preg_replace($pregreplacestopwords, $pregreplacestopwords, $string);

Just get's filled with errors

Warning: preg_replace(): Compilation failed: missing terminating ] for character class at offset 1951 in C:\wamp64\www\pg\test.php on line 664

Warning: preg_replace(): Empty regular expression in C:\wamp64\www\pg\test.php on line 666
NULL 
Warning: preg_replace(): Unknown modifier '/' in C:\wamp64\www\pg\test.php on line 670
NULL

Imploding the array, via$implodestopwords = implode("|", array_map("trim",array_filter($stopwords)));

a|about|above|above|across|after|afterwards|again|against|all|almost|alone|along|already|also

and so forth.

Trying to put this in action

$pattern = '/\b(' . $implodestopwords . ')\b/i';
$string = preg_replace($pattern, "", $string);

var_dump($string);

outputs:

I got “B..” ...

How can I modify my preg_replace to only match exact words and remove them from a large list of words from an array?

Full script here: https://pastebin.com/vwbNjhs9

  • 写回答

1条回答 默认 最新

  • dongtuo8170 2018-06-03 07:58
    关注

    Maybe instead of using preg_replace() you might just try turning your string into an array and then looping over it checking if each word is in your stop words array.

    Try this and see if it works:

    $string = 'I got my "B.A." from...';
    $string = preg_replace('/\s{1,}/', ' ', $string); //<--insure only one space between characters.
    $array = explode(' ', $string);
    
    for($i = 0; $i < count($array); $i++){
    
      if(in_array($array[$i] . ' ', $stopwords)){ //<-- Only concatenated space because of your
      //trailing spaces in the stopwords array.
    
        $array[$i] = '';  //<--Removed the word.
    
      }
    
    }
    
    $newString = implode(' ', $array);  //<--Turn the array back to a string.
    
    echo $newString; //<---Outputs "I got "B.A." from...".
    

    This method gives you a lot of control over what you decide to do with each found word.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 python课程管理系统
  • ¥15 python+selenium,在新增时弹出了一个输入框
  • ¥15 苹果验机结果的api接口哪里有??单次调用1毛钱及以下。
  • ¥20 学生成绩管理系统设计
  • ¥15 来一个cc穿盾脚本开发者
  • ¥15 CST2023安装报错
  • ¥15 使用diffusionbert生成文字 结果是PAD和UNK怎么办
  • ¥15 有人懂怎么做大模型的客服系统吗?卡住了卡住了
  • ¥20 firefly-rk3399上启动卡住了
  • ¥15 如何删除这个虚拟音频