drn1008 2018-06-03 07:08
浏览 84
已采纳

Preg_Replace(删除)精确匹配单词PHP的数组

I have an array of stopwords set into an array

$stopwords = array(
    "a ",
    "about ",
    "above ",
    "above ",
    "across ",
    "after ",
    "afterwards ",
    "again ",
    "against ",
    "all ",
    "almost ",
    "alone ",
    "along ",
    "already ",
    "also ",
    "although ",
    "always ",
    "am ",
    "among ",
    "amongst ",
    "amoungst ",
    "amount ",
    "an ",
    "and ",
    "another ",
    "any ",
    "anyhow ",
    "anyone ",
    "anything ",
    "anyway ",
    "anywhere ",
    "are ",
    "around ",
    "as ",
    "at ",
    "back ",
    "be ",
    "became ",
    "because ",
    "become ",
    "becomes ",
    "becoming ",
    "been ",
    "before ",
    "beforehand ",
    "behind ",
    "being ",
    "below ",
    "beside ",
    "besides ",
    "between ",
    "beyond ",
    "bill ",
    "both ",
    "bottom ",
    "but ",
    "by ",
    "can ",
    "cannot ",
    "cant ",
    "co ",
    "con ",
    "could ",
    "couldnt ",
    "cry ",
    "considered ",
    "describe ",
    "detail ",
    "do ",
    "did ",
    "done ",
    "down ",
    "due ",
    "during ",
    "each ",
    "eg ",
    "eight ",
    "either ",
    "eleven ",
    "else ",
    "elsewhere ",
    "empty ",
    "enough ",
    "etc ",
    "even ",
    "ever ",
    "every ",
    "everyone ",
    "everything ",
    "everywhere ",
    "except ",
    "few ",
    "fifteen ",
    "fify ",
    "fill ",
    "find ",
    "fire ",
    "five ",
    "for ",
    "former ",
    "formerly ",
    "forty ",
    "found ",
    "four ",
    "from ",
    "front ",
    "full ",
    "further ",
    "get ",
    "give ",
    "go ",
    "had ",
//    "has ",
    "hasnt ",
    "have ",
    "he ",
    "hence ",
    "her ",
    "here ",
    "hereafter ",
    "hereby ",
    "herein ",
    "hereupon ",
    "hers ",
    "herself ",
    "him ",
    "himself ",
    "his ",
    "how ",
    "however ",
    "hundred ",
    "ie ",
    "if ",
    "In",
    "inc ",
    "indeed ",
    "interest ",
    "into ",
    "is ",
    "it ",
    "its ",
    "itself ",
    "keep ",
    "known ",
//    "last ",
    "latter ",
    "latterly ",
    "least ",
    "legend ",
    "less ",
    "ltd ",
//    "made ",
    "many ",
    "may ",
    "me ",
    "meanwhile ",
    "might ",
    "mill ",
    "mine ",
    "more ",
    "moreover ",
//    "most ",
    "mostly ",
    "move ",
    "much ",
    "must ",
    "my ",
    "myself ",
    "name ",
    "namely ",
    "neither ",
    "never ",
    "nevertheless ",
    "next ",
    "nine ",
    "no ",
    "nobody ",
    "none ",
    "noone ",
    "nor ",
    "nothing ",
    "now ",
    "nowhere ",
    "of ",
    "off ",
    "often ",
    "on ",
    "once ",
    "one ",
    "only ",
    "onto ",
    "or ",
    "other ",
    "others ",
    "otherwise ",
    "our ",
    "ours ",
    "ourselves ",
    "out ",
//    "over ",
    "own ",
    "part ",
    "per ",
    "perhaps ",
    "please ",
    "popular ",
    "put ",
    "rather ",
    "re ",
    "same ",
    "see ",
    "seem ",
    "seemed ",
    "seeming ",
    "seems ",
    "serious ",
    "several ",
    "she ",
    "should ",
    "show ",
    "since ",
    "sincere ",
    "six ",
    "sixty ",
    "so ",
    "some ",
    "somehow ",
    "someone ",
    "something ",
    "sometime ",
    "sometimes ",
    "somewhere ",
    "still ",
    "such ",
    "take ",
    "technique ",
    "ten ",
    "than ",
    "that ",
    "the ",
    "their ",
    "them ",
    "themselves ",
    "then ",
    "thence ",
    "there ",
    "thereafter ",
    "thereby ",
    "therefore ",
    "therein ",
    "thereupon ",
    "these ",
    "they ",
    "thickv ",
    "term ",
    "thin ",
    "third ",
    "this ",
    "those ",
    "though ",
    "three ",
    "through ",
    "throughout ",
    "thru ",
    "thus ",
    "to ",
    "together ",
    "too ",
    "top ",
    "toward ",
    "towards ",
    "twelve ",
    "twenty ",
    "two ",
    "un ",
    "under ",
    "until ",
    "up ",
    "upon ",
    "us ",
    "very ",
    "via ",
    "was ",
    "we ",
    "well ",
    "were ",
    "what ",
    "whatever ",
    "when ",
    "whence ",
    "whenever ",
    "where ",
    "whereafter ",
    "whereas ",
    "whereby ",
    "wherein ",
    "whereupon ",
    "wherever ",
    "whether ",
    "which ",
    "while ",
    "whither ",
    "who ",
    "whoever ",
    "whole ",
    "whom ",
    "whose ",
    "why ",
    "will ",
    "with ",
    "within ",
    "without ",
    "would ",
    "yet ",
    "you ",
    "your ",
    "yours ",
    "yourself ",
    "yourselves ",
    "the ",
    "likely ",
    "names "
);

You may have noticed by the space after I was trying to avoid cutting off strings and want to only replace whole matches (to a NULL value) from my stopword list.

Realizing that str_replace is probably secondary to capabilities and benefits, I turned my eye towards building a preg_replace array in attempt to regex whole words using word boundaries.

$pregreplacestopwords = array(
"/\ba\b/",
"/\babout\b/",
"/\babove\b/",
"/\babove\b/",
"/\bacross\b/",
"/\bafter\b/",
"/\bafterwards\b/",
"/\bagain\b/",
"/\bagainst\b/",
"/\ball\b/",
"/\balmost\b/",
"/\balone\b/",
"/\balong\b/",
"/\balready\b/",
"/\balso\b/",
"/\balthough\b/",
"/\balways\b/",
"/\bam\b/",
"/\bamong\b/",
"/\bamongst\b/",
"/\bamoungst\b/",
"/\bamount\b/",
"/\ban\b/",
"/\band\b/",
"/\banother\b/",
"/\bany\b/",
"/\banyhow\b/",
"/\banyone\b/",
"/\banything\b/",
"/\banyway\b/",
"/\banywhere\b/",
"/\bare\b/",
"/\baround\b/",
"/\bas\b/",
"/\bat\b/",
"/\bback\b/",
"/\bbe\b/",
"/\bbecame\b/",
"/\bbecause\b/",
"/\bbecome\b/",
"/\bbecomes\b/",
"/\bbecoming\b/",
"/\bbeen\b/",
"/\bbefore\b/",
"/\bbeforehand\b/",
"/\bbehind\b/",
"/\bbeing\b/",
"/\bbelow\b/",
"/\bbeside\b/",
"/\bbesides\b/",
"/\bbetween\b/",
"/\bbeyond\b/",
"/\bbill\b/",
"/\bboth\b/",
"/\bbottom\b/",
"/\bbut\b/",
"/\bby\b/",
"/\bcan\b/",
"/\bcannot\b/",
"/\bcant\b/",
"/\bco\b/",
"/\bcon\b/",
"/\bcould\b/",
"/\bcouldnt\b/",
"/\bcry\b/",
"/\bconsidered\b/",
"/\bdescribe\b/",
"/\bdetail\b/",
"/\bdo\b/",
"/\bdid\b/",
"/\bdone\b/",
"/\bdown\b/",
"/\bdue\b/",
"/\bduring\b/",
"/\beach\b/",
"/\beg\b/",
"/\beight\b/",
"/\beither\b/",
"/\beleven\b/",
"/\belse\b/",
"/\belsewhere\b/",
"/\bempty\b/",
"/\benough\b/",
"/\betc\b/",
"/\beven\b/",
"/\bever\b/",
"/\bevery\b/",
"/\beveryone\b/",
"/\beverything\b/",
"/\beverywhere\b/",
"/\bexcept\b/",
"/\bfew\b/",
"/\bfifteen\b/",
"/\bfify\b/",
"/\bfill\b/",
"/\bfind\b/",
"/\bfire\b/",
"/\bfive\b/",
"/\bfor\b/",
"/\bformer\b/",
"/\bformerly\b/",
"/\bforty\b/",
"/\bfound\b/",
"/\bfour\b/",
"/\bfrom\b/",
"/\bfront\b/",
"/\bfull\b/",
"/\bfurther\b/",
"/\bget\b/",
"/\bgive\b/",
"/\bgo\b/",
"/\bhad\b/",
"/\b//has\b/",
"/\bhasnt\b/",
"/\bhave\b/",
"/\bhe\b/",
"/\bhence\b/",
"/\bher\b/",
"/\bhere\b/",
"/\bhereafter\b/",
"/\bhereby\b/",
"/\bherein\b/",
"/\bhereupon\b/",
"/\bhers\b/",
"/\bherself\b/",
"/\bhim\b/",
"/\bhimself\b/",
"/\bhis\b/",
"/\bhow\b/",
"/\bhowever\b/",
"/\bhundred\b/",
"/\bie\b/",
"/\bif\b/",
"/\bIn\b/",
"/\binc\b/",
"/\bindeed\b/",
"/\binterest\b/",
"/\binto\b/",
"/\bis\b/",
"/\bit\b/",
"/\bits\b/",
"/\bitself\b/",
"/\bkeep\b/",
"/\bknown\b/",
"/\b//last\b/",
"/\blatter\b/",
"/\blatterly\b/",
"/\bleast\b/",
"/\blegend\b/",
"/\bless\b/",
"/\bltd\b/",
"/\b//made\b/",
"/\bmany\b/",
"/\bmay\b/",
"/\bme\b/",
"/\bmeanwhile\b/",
"/\bmight\b/",
"/\bmill\b/",
"/\bmine\b/",
"/\bmore\b/",
"/\bmoreover\b/",
"/\bmost\b/",
"/\bmostly\b/",
"/\bmove\b/",
"/\bmuch\b/",
"/\bmust\b/",
"/\bmy\b/",
"/\bmyself\b/",
"/\bname\b/",
"/\bnamely\b/",
"/\bneither\b/",
"/\bnever\b/",
"/\bnevertheless\b/",
"/\bnext\b/",
"/\bnine\b/",
"/\bno\b/",
"/\bnobody\b/",
"/\bnone\b/",
"/\bnoone\b/",
"/\bnor\b/",
"/\bnothing\b/",
"/\bnow\b/",
"/\bnowhere\b/",
"/\bof\b/",
"/\boff\b/",
"/\boften\b/",
"/\bon\b/",
"/\bonce\b/",
"/\bone\b/",
"/\bonly\b/",
"/\bonto\b/",
"/\bor\b/",
"/\bother\b/",
"/\bothers\b/",
"/\botherwise\b/",
"/\bour\b/",
"/\bours\b/",
"/\bourselves\b/",
"/\bout\b/",
"/\b//over\b/",
"/\bown\b/",
"/\bpart\b/",
"/\bper\b/",
"/\bperhaps\b/",
"/\bplease\b/",
"/\bpopular\b/",
"/\bput\b/",
"/\brather\b/",
"/\bre\b/",
"/\bsame\b/",
"/\bsee\b/",
"/\bseem\b/",
"/\bseemed\b/",
"/\bseeming\b/",
"/\bseems\b/",
"/\bserious\b/",
"/\bseveral\b/",
"/\bshe\b/",
"/\bshould\b/",
"/\bshow\b/",
"/\bsince\b/",
"/\bsincere\b/",
"/\bsix\b/",
"/\bsixty\b/",
"/\bso\b/",
"/\bsome\b/",
"/\bsomehow\b/",
"/\bsomeone\b/",
"/\bsomething\b/",
"/\bsometime\b/",
"/\bsometimes\b/",
"/\bsomewhere\b/",
"/\bstill\b/",
"/\bsuch\b/",
"/\btake\b/",
"/\btechnique\b/",
"/\bten\b/",
"/\bthan\b/",
"/\bthat\b/",
"/\bthe\b/",
"/\btheir\b/",
"/\bthem\b/",
"/\bthemselves\b/",
"/\bthen\b/",
"/\bthence\b/",
"/\bthere\b/",
"/\bthereafter\b/",
"/\bthereby\b/",
"/\btherefore\b/",
"/\btherein\b/",
"/\bthereupon\b/",
"/\bthese\b/",
"/\bthey\b/",
"/\bthickv\b/",
"/\bterm\b/",
"/\bthin\b/",
"/\bthird\b/",
"/\bthis\b/",
"/\bthose\b/",
"/\bthough\b/",
"/\bthree\b/",
"/\bthrough\b/",
"/\bthroughout\b/",
"/\bthru\b/",
"/\bthus\b/",
"/\bto\b/",
"/\btogether\b/",
"/\btoo\b/",
"/\btop\b/",
"/\btoward\b/",
"/\btowards\b/",
"/\btwelve\b/",
"/\btwenty\b/",
"/\btwo\b/",
"/\bun\b/",
"/\bunder\b/",
"/\buntil\b/",
"/\bup\b/",
"/\bupon\b/",
"/\bus\b/",
"/\bvery\b/",
"/\bvia\b/",
"/\bwas\b/",
"/\bwe\b/",
"/\bwell\b/",
"/\bwere\b/",
"/\bwhat\b/",
"/\bwhatever\b/",
"/\bwhen\b/",
"/\bwhence\b/",
"/\bwhenever\b/",
"/\bwhere\b/",
"/\bwhereafter\b/",
"/\bwhereas\b/",
"/\bwhereby\b/",
"/\bwherein\b/",
"/\bwhereupon\b/",
"/\bwherever\b/",
"/\bwhether\b/",
"/\bwhich\b/",
"/\bwhile\b/",
"/\bwhither\b/",
"/\bwho\b/",
"/\bwhoever\b/",
"/\bwhole\b/",
"/\bwhom\b/",
"/\bwhose\b/",
"/\bwhy\b/",
"/\bwill\b/",
"/\bwith\b/",
"/\bwithin\b/",
"/\bwithout\b/",
"/\bwould\b/",
"/\byet\b/",
"/\byou\b/",
"/\byour\b/",
"/\byours\b/",
"/\byourself\b/",
"/\byourselves\b/",
"/\bthe\b/",
"/\blikely\b/",
"/\bnames\b/"
        );

Created a blank array for it:

$pgreplace = array

Let's take the word “B.A.” for example and put it into a string variable, make it a sentence for fun.

 $string = 'I got my “B.A.” from...';

Some methods I've tried have been things such as imploding the stop words,

Attempting things such as

preg_replace($pregreplacestopwords, $pregreplacestopwords, $string);

Just get's filled with errors

Warning: preg_replace(): Compilation failed: missing terminating ] for character class at offset 1951 in C:\wamp64\www\pg\test.php on line 664

Warning: preg_replace(): Empty regular expression in C:\wamp64\www\pg\test.php on line 666
NULL 
Warning: preg_replace(): Unknown modifier '/' in C:\wamp64\www\pg\test.php on line 670
NULL

Imploding the array, via$implodestopwords = implode("|", array_map("trim",array_filter($stopwords)));

a|about|above|above|across|after|afterwards|again|against|all|almost|alone|along|already|also

and so forth.

Trying to put this in action

$pattern = '/\b(' . $implodestopwords . ')\b/i';
$string = preg_replace($pattern, "", $string);

var_dump($string);

outputs:

I got “B..” ...

How can I modify my preg_replace to only match exact words and remove them from a large list of words from an array?

Full script here: https://pastebin.com/vwbNjhs9

  • 写回答

1条回答 默认 最新

  • dongtuo8170 2018-06-03 07:58
    关注

    Maybe instead of using preg_replace() you might just try turning your string into an array and then looping over it checking if each word is in your stop words array.

    Try this and see if it works:

    $string = 'I got my "B.A." from...';
    $string = preg_replace('/\s{1,}/', ' ', $string); //<--insure only one space between characters.
    $array = explode(' ', $string);
    
    for($i = 0; $i < count($array); $i++){
    
      if(in_array($array[$i] . ' ', $stopwords)){ //<-- Only concatenated space because of your
      //trailing spaces in the stopwords array.
    
        $array[$i] = '';  //<--Removed the word.
    
      }
    
    }
    
    $newString = implode(' ', $array);  //<--Turn the array back to a string.
    
    echo $newString; //<---Outputs "I got "B.A." from...".
    

    This method gives you a lot of control over what you decide to do with each found word.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?