drn1008 2018-06-03 07:08
浏览 85
已采纳

Preg_Replace(删除)精确匹配单词PHP的数组

I have an array of stopwords set into an array

$stopwords = array(
    "a ",
    "about ",
    "above ",
    "above ",
    "across ",
    "after ",
    "afterwards ",
    "again ",
    "against ",
    "all ",
    "almost ",
    "alone ",
    "along ",
    "already ",
    "also ",
    "although ",
    "always ",
    "am ",
    "among ",
    "amongst ",
    "amoungst ",
    "amount ",
    "an ",
    "and ",
    "another ",
    "any ",
    "anyhow ",
    "anyone ",
    "anything ",
    "anyway ",
    "anywhere ",
    "are ",
    "around ",
    "as ",
    "at ",
    "back ",
    "be ",
    "became ",
    "because ",
    "become ",
    "becomes ",
    "becoming ",
    "been ",
    "before ",
    "beforehand ",
    "behind ",
    "being ",
    "below ",
    "beside ",
    "besides ",
    "between ",
    "beyond ",
    "bill ",
    "both ",
    "bottom ",
    "but ",
    "by ",
    "can ",
    "cannot ",
    "cant ",
    "co ",
    "con ",
    "could ",
    "couldnt ",
    "cry ",
    "considered ",
    "describe ",
    "detail ",
    "do ",
    "did ",
    "done ",
    "down ",
    "due ",
    "during ",
    "each ",
    "eg ",
    "eight ",
    "either ",
    "eleven ",
    "else ",
    "elsewhere ",
    "empty ",
    "enough ",
    "etc ",
    "even ",
    "ever ",
    "every ",
    "everyone ",
    "everything ",
    "everywhere ",
    "except ",
    "few ",
    "fifteen ",
    "fify ",
    "fill ",
    "find ",
    "fire ",
    "five ",
    "for ",
    "former ",
    "formerly ",
    "forty ",
    "found ",
    "four ",
    "from ",
    "front ",
    "full ",
    "further ",
    "get ",
    "give ",
    "go ",
    "had ",
//    "has ",
    "hasnt ",
    "have ",
    "he ",
    "hence ",
    "her ",
    "here ",
    "hereafter ",
    "hereby ",
    "herein ",
    "hereupon ",
    "hers ",
    "herself ",
    "him ",
    "himself ",
    "his ",
    "how ",
    "however ",
    "hundred ",
    "ie ",
    "if ",
    "In",
    "inc ",
    "indeed ",
    "interest ",
    "into ",
    "is ",
    "it ",
    "its ",
    "itself ",
    "keep ",
    "known ",
//    "last ",
    "latter ",
    "latterly ",
    "least ",
    "legend ",
    "less ",
    "ltd ",
//    "made ",
    "many ",
    "may ",
    "me ",
    "meanwhile ",
    "might ",
    "mill ",
    "mine ",
    "more ",
    "moreover ",
//    "most ",
    "mostly ",
    "move ",
    "much ",
    "must ",
    "my ",
    "myself ",
    "name ",
    "namely ",
    "neither ",
    "never ",
    "nevertheless ",
    "next ",
    "nine ",
    "no ",
    "nobody ",
    "none ",
    "noone ",
    "nor ",
    "nothing ",
    "now ",
    "nowhere ",
    "of ",
    "off ",
    "often ",
    "on ",
    "once ",
    "one ",
    "only ",
    "onto ",
    "or ",
    "other ",
    "others ",
    "otherwise ",
    "our ",
    "ours ",
    "ourselves ",
    "out ",
//    "over ",
    "own ",
    "part ",
    "per ",
    "perhaps ",
    "please ",
    "popular ",
    "put ",
    "rather ",
    "re ",
    "same ",
    "see ",
    "seem ",
    "seemed ",
    "seeming ",
    "seems ",
    "serious ",
    "several ",
    "she ",
    "should ",
    "show ",
    "since ",
    "sincere ",
    "six ",
    "sixty ",
    "so ",
    "some ",
    "somehow ",
    "someone ",
    "something ",
    "sometime ",
    "sometimes ",
    "somewhere ",
    "still ",
    "such ",
    "take ",
    "technique ",
    "ten ",
    "than ",
    "that ",
    "the ",
    "their ",
    "them ",
    "themselves ",
    "then ",
    "thence ",
    "there ",
    "thereafter ",
    "thereby ",
    "therefore ",
    "therein ",
    "thereupon ",
    "these ",
    "they ",
    "thickv ",
    "term ",
    "thin ",
    "third ",
    "this ",
    "those ",
    "though ",
    "three ",
    "through ",
    "throughout ",
    "thru ",
    "thus ",
    "to ",
    "together ",
    "too ",
    "top ",
    "toward ",
    "towards ",
    "twelve ",
    "twenty ",
    "two ",
    "un ",
    "under ",
    "until ",
    "up ",
    "upon ",
    "us ",
    "very ",
    "via ",
    "was ",
    "we ",
    "well ",
    "were ",
    "what ",
    "whatever ",
    "when ",
    "whence ",
    "whenever ",
    "where ",
    "whereafter ",
    "whereas ",
    "whereby ",
    "wherein ",
    "whereupon ",
    "wherever ",
    "whether ",
    "which ",
    "while ",
    "whither ",
    "who ",
    "whoever ",
    "whole ",
    "whom ",
    "whose ",
    "why ",
    "will ",
    "with ",
    "within ",
    "without ",
    "would ",
    "yet ",
    "you ",
    "your ",
    "yours ",
    "yourself ",
    "yourselves ",
    "the ",
    "likely ",
    "names "
);

You may have noticed by the space after I was trying to avoid cutting off strings and want to only replace whole matches (to a NULL value) from my stopword list.

Realizing that str_replace is probably secondary to capabilities and benefits, I turned my eye towards building a preg_replace array in attempt to regex whole words using word boundaries.

$pregreplacestopwords = array(
"/\ba\b/",
"/\babout\b/",
"/\babove\b/",
"/\babove\b/",
"/\bacross\b/",
"/\bafter\b/",
"/\bafterwards\b/",
"/\bagain\b/",
"/\bagainst\b/",
"/\ball\b/",
"/\balmost\b/",
"/\balone\b/",
"/\balong\b/",
"/\balready\b/",
"/\balso\b/",
"/\balthough\b/",
"/\balways\b/",
"/\bam\b/",
"/\bamong\b/",
"/\bamongst\b/",
"/\bamoungst\b/",
"/\bamount\b/",
"/\ban\b/",
"/\band\b/",
"/\banother\b/",
"/\bany\b/",
"/\banyhow\b/",
"/\banyone\b/",
"/\banything\b/",
"/\banyway\b/",
"/\banywhere\b/",
"/\bare\b/",
"/\baround\b/",
"/\bas\b/",
"/\bat\b/",
"/\bback\b/",
"/\bbe\b/",
"/\bbecame\b/",
"/\bbecause\b/",
"/\bbecome\b/",
"/\bbecomes\b/",
"/\bbecoming\b/",
"/\bbeen\b/",
"/\bbefore\b/",
"/\bbeforehand\b/",
"/\bbehind\b/",
"/\bbeing\b/",
"/\bbelow\b/",
"/\bbeside\b/",
"/\bbesides\b/",
"/\bbetween\b/",
"/\bbeyond\b/",
"/\bbill\b/",
"/\bboth\b/",
"/\bbottom\b/",
"/\bbut\b/",
"/\bby\b/",
"/\bcan\b/",
"/\bcannot\b/",
"/\bcant\b/",
"/\bco\b/",
"/\bcon\b/",
"/\bcould\b/",
"/\bcouldnt\b/",
"/\bcry\b/",
"/\bconsidered\b/",
"/\bdescribe\b/",
"/\bdetail\b/",
"/\bdo\b/",
"/\bdid\b/",
"/\bdone\b/",
"/\bdown\b/",
"/\bdue\b/",
"/\bduring\b/",
"/\beach\b/",
"/\beg\b/",
"/\beight\b/",
"/\beither\b/",
"/\beleven\b/",
"/\belse\b/",
"/\belsewhere\b/",
"/\bempty\b/",
"/\benough\b/",
"/\betc\b/",
"/\beven\b/",
"/\bever\b/",
"/\bevery\b/",
"/\beveryone\b/",
"/\beverything\b/",
"/\beverywhere\b/",
"/\bexcept\b/",
"/\bfew\b/",
"/\bfifteen\b/",
"/\bfify\b/",
"/\bfill\b/",
"/\bfind\b/",
"/\bfire\b/",
"/\bfive\b/",
"/\bfor\b/",
"/\bformer\b/",
"/\bformerly\b/",
"/\bforty\b/",
"/\bfound\b/",
"/\bfour\b/",
"/\bfrom\b/",
"/\bfront\b/",
"/\bfull\b/",
"/\bfurther\b/",
"/\bget\b/",
"/\bgive\b/",
"/\bgo\b/",
"/\bhad\b/",
"/\b//has\b/",
"/\bhasnt\b/",
"/\bhave\b/",
"/\bhe\b/",
"/\bhence\b/",
"/\bher\b/",
"/\bhere\b/",
"/\bhereafter\b/",
"/\bhereby\b/",
"/\bherein\b/",
"/\bhereupon\b/",
"/\bhers\b/",
"/\bherself\b/",
"/\bhim\b/",
"/\bhimself\b/",
"/\bhis\b/",
"/\bhow\b/",
"/\bhowever\b/",
"/\bhundred\b/",
"/\bie\b/",
"/\bif\b/",
"/\bIn\b/",
"/\binc\b/",
"/\bindeed\b/",
"/\binterest\b/",
"/\binto\b/",
"/\bis\b/",
"/\bit\b/",
"/\bits\b/",
"/\bitself\b/",
"/\bkeep\b/",
"/\bknown\b/",
"/\b//last\b/",
"/\blatter\b/",
"/\blatterly\b/",
"/\bleast\b/",
"/\blegend\b/",
"/\bless\b/",
"/\bltd\b/",
"/\b//made\b/",
"/\bmany\b/",
"/\bmay\b/",
"/\bme\b/",
"/\bmeanwhile\b/",
"/\bmight\b/",
"/\bmill\b/",
"/\bmine\b/",
"/\bmore\b/",
"/\bmoreover\b/",
"/\bmost\b/",
"/\bmostly\b/",
"/\bmove\b/",
"/\bmuch\b/",
"/\bmust\b/",
"/\bmy\b/",
"/\bmyself\b/",
"/\bname\b/",
"/\bnamely\b/",
"/\bneither\b/",
"/\bnever\b/",
"/\bnevertheless\b/",
"/\bnext\b/",
"/\bnine\b/",
"/\bno\b/",
"/\bnobody\b/",
"/\bnone\b/",
"/\bnoone\b/",
"/\bnor\b/",
"/\bnothing\b/",
"/\bnow\b/",
"/\bnowhere\b/",
"/\bof\b/",
"/\boff\b/",
"/\boften\b/",
"/\bon\b/",
"/\bonce\b/",
"/\bone\b/",
"/\bonly\b/",
"/\bonto\b/",
"/\bor\b/",
"/\bother\b/",
"/\bothers\b/",
"/\botherwise\b/",
"/\bour\b/",
"/\bours\b/",
"/\bourselves\b/",
"/\bout\b/",
"/\b//over\b/",
"/\bown\b/",
"/\bpart\b/",
"/\bper\b/",
"/\bperhaps\b/",
"/\bplease\b/",
"/\bpopular\b/",
"/\bput\b/",
"/\brather\b/",
"/\bre\b/",
"/\bsame\b/",
"/\bsee\b/",
"/\bseem\b/",
"/\bseemed\b/",
"/\bseeming\b/",
"/\bseems\b/",
"/\bserious\b/",
"/\bseveral\b/",
"/\bshe\b/",
"/\bshould\b/",
"/\bshow\b/",
"/\bsince\b/",
"/\bsincere\b/",
"/\bsix\b/",
"/\bsixty\b/",
"/\bso\b/",
"/\bsome\b/",
"/\bsomehow\b/",
"/\bsomeone\b/",
"/\bsomething\b/",
"/\bsometime\b/",
"/\bsometimes\b/",
"/\bsomewhere\b/",
"/\bstill\b/",
"/\bsuch\b/",
"/\btake\b/",
"/\btechnique\b/",
"/\bten\b/",
"/\bthan\b/",
"/\bthat\b/",
"/\bthe\b/",
"/\btheir\b/",
"/\bthem\b/",
"/\bthemselves\b/",
"/\bthen\b/",
"/\bthence\b/",
"/\bthere\b/",
"/\bthereafter\b/",
"/\bthereby\b/",
"/\btherefore\b/",
"/\btherein\b/",
"/\bthereupon\b/",
"/\bthese\b/",
"/\bthey\b/",
"/\bthickv\b/",
"/\bterm\b/",
"/\bthin\b/",
"/\bthird\b/",
"/\bthis\b/",
"/\bthose\b/",
"/\bthough\b/",
"/\bthree\b/",
"/\bthrough\b/",
"/\bthroughout\b/",
"/\bthru\b/",
"/\bthus\b/",
"/\bto\b/",
"/\btogether\b/",
"/\btoo\b/",
"/\btop\b/",
"/\btoward\b/",
"/\btowards\b/",
"/\btwelve\b/",
"/\btwenty\b/",
"/\btwo\b/",
"/\bun\b/",
"/\bunder\b/",
"/\buntil\b/",
"/\bup\b/",
"/\bupon\b/",
"/\bus\b/",
"/\bvery\b/",
"/\bvia\b/",
"/\bwas\b/",
"/\bwe\b/",
"/\bwell\b/",
"/\bwere\b/",
"/\bwhat\b/",
"/\bwhatever\b/",
"/\bwhen\b/",
"/\bwhence\b/",
"/\bwhenever\b/",
"/\bwhere\b/",
"/\bwhereafter\b/",
"/\bwhereas\b/",
"/\bwhereby\b/",
"/\bwherein\b/",
"/\bwhereupon\b/",
"/\bwherever\b/",
"/\bwhether\b/",
"/\bwhich\b/",
"/\bwhile\b/",
"/\bwhither\b/",
"/\bwho\b/",
"/\bwhoever\b/",
"/\bwhole\b/",
"/\bwhom\b/",
"/\bwhose\b/",
"/\bwhy\b/",
"/\bwill\b/",
"/\bwith\b/",
"/\bwithin\b/",
"/\bwithout\b/",
"/\bwould\b/",
"/\byet\b/",
"/\byou\b/",
"/\byour\b/",
"/\byours\b/",
"/\byourself\b/",
"/\byourselves\b/",
"/\bthe\b/",
"/\blikely\b/",
"/\bnames\b/"
        );

Created a blank array for it:

$pgreplace = array(" "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," ");

Let's take the word “B.A.” for example and put it into a string variable, make it a sentence for fun.

 $string = 'I got my “B.A.” from...';

Some methods I've tried have been things such as imploding the stop words,

Attempting things such as

preg_replace($pregreplacestopwords, $pregreplacestopwords, $string);

Just get's filled with errors

Warning: preg_replace(): Compilation failed: missing terminating ] for character class at offset 1951 in C:\wamp64\www\pg\test.php on line 664

Warning: preg_replace(): Empty regular expression in C:\wamp64\www\pg\test.php on line 666
NULL 
Warning: preg_replace(): Unknown modifier '/' in C:\wamp64\www\pg\test.php on line 670
NULL

Imploding the array, via$implodestopwords = implode("|", array_map("trim",array_filter($stopwords)));

a|about|above|above|across|after|afterwards|again|against|all|almost|alone|along|already|also

and so forth.

Trying to put this in action

$pattern = '/\b(' . $implodestopwords . ')\b/i';
$string = preg_replace($pattern, "", $string);

var_dump($string);

outputs:

I got “B..” ...

How can I modify my preg_replace to only match exact words and remove them from a large list of words from an array?

Full script here: https://pastebin.com/vwbNjhs9

  • 写回答

1条回答 默认 最新

  • dongtuo8170 2018-06-03 07:58
    关注

    Maybe instead of using preg_replace() you might just try turning your string into an array and then looping over it checking if each word is in your stop words array.

    Try this and see if it works:

    $string = 'I got my "B.A." from...';
    $string = preg_replace('/\s{1,}/', ' ', $string); //<--insure only one space between characters.
    $array = explode(' ', $string);
    
    for($i = 0; $i < count($array); $i++){
    
      if(in_array($array[$i] . ' ', $stopwords)){ //<-- Only concatenated space because of your
      //trailing spaces in the stopwords array.
    
        $array[$i] = '';  //<--Removed the word.
    
      }
    
    }
    
    $newString = implode(' ', $array);  //<--Turn the array back to a string.
    
    echo $newString; //<---Outputs "I got "B.A." from...".
    

    This method gives you a lot of control over what you decide to do with each found word.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 SAP HANA SQL Script 。SUM OVER 怎么加where
  • ¥15 怎么获取红包封面的原始链接,并且获取红包封面序列号
  • ¥100 微信小程序跑脚本授权的问题
  • ¥60 为什么使用python对地震数据进行umap降维后,数据成图会出现不连续的现象
  • ¥100 房产抖音小程序苹果搜不到安卓可以付费悬赏
  • ¥15 STM32串口接收问题
  • ¥15 腾讯IOA系统怎么在文件夹里修改办公网络的连接
  • ¥15 filenotfounderror:文件是存在的,权限也给了,但还一直报错
  • ¥15 MATLAB和mosek的求解问题
  • ¥20 修改中兴光猫sn的时候提示失败