I have an array of stopwords set into an array
$stopwords = array(
"a ",
"about ",
"above ",
"above ",
"across ",
"after ",
"afterwards ",
"again ",
"against ",
"all ",
"almost ",
"alone ",
"along ",
"already ",
"also ",
"although ",
"always ",
"am ",
"among ",
"amongst ",
"amoungst ",
"amount ",
"an ",
"and ",
"another ",
"any ",
"anyhow ",
"anyone ",
"anything ",
"anyway ",
"anywhere ",
"are ",
"around ",
"as ",
"at ",
"back ",
"be ",
"became ",
"because ",
"become ",
"becomes ",
"becoming ",
"been ",
"before ",
"beforehand ",
"behind ",
"being ",
"below ",
"beside ",
"besides ",
"between ",
"beyond ",
"bill ",
"both ",
"bottom ",
"but ",
"by ",
"can ",
"cannot ",
"cant ",
"co ",
"con ",
"could ",
"couldnt ",
"cry ",
"considered ",
"describe ",
"detail ",
"do ",
"did ",
"done ",
"down ",
"due ",
"during ",
"each ",
"eg ",
"eight ",
"either ",
"eleven ",
"else ",
"elsewhere ",
"empty ",
"enough ",
"etc ",
"even ",
"ever ",
"every ",
"everyone ",
"everything ",
"everywhere ",
"except ",
"few ",
"fifteen ",
"fify ",
"fill ",
"find ",
"fire ",
"five ",
"for ",
"former ",
"formerly ",
"forty ",
"found ",
"four ",
"from ",
"front ",
"full ",
"further ",
"get ",
"give ",
"go ",
"had ",
// "has ",
"hasnt ",
"have ",
"he ",
"hence ",
"her ",
"here ",
"hereafter ",
"hereby ",
"herein ",
"hereupon ",
"hers ",
"herself ",
"him ",
"himself ",
"his ",
"how ",
"however ",
"hundred ",
"ie ",
"if ",
"In",
"inc ",
"indeed ",
"interest ",
"into ",
"is ",
"it ",
"its ",
"itself ",
"keep ",
"known ",
// "last ",
"latter ",
"latterly ",
"least ",
"legend ",
"less ",
"ltd ",
// "made ",
"many ",
"may ",
"me ",
"meanwhile ",
"might ",
"mill ",
"mine ",
"more ",
"moreover ",
// "most ",
"mostly ",
"move ",
"much ",
"must ",
"my ",
"myself ",
"name ",
"namely ",
"neither ",
"never ",
"nevertheless ",
"next ",
"nine ",
"no ",
"nobody ",
"none ",
"noone ",
"nor ",
"nothing ",
"now ",
"nowhere ",
"of ",
"off ",
"often ",
"on ",
"once ",
"one ",
"only ",
"onto ",
"or ",
"other ",
"others ",
"otherwise ",
"our ",
"ours ",
"ourselves ",
"out ",
// "over ",
"own ",
"part ",
"per ",
"perhaps ",
"please ",
"popular ",
"put ",
"rather ",
"re ",
"same ",
"see ",
"seem ",
"seemed ",
"seeming ",
"seems ",
"serious ",
"several ",
"she ",
"should ",
"show ",
"since ",
"sincere ",
"six ",
"sixty ",
"so ",
"some ",
"somehow ",
"someone ",
"something ",
"sometime ",
"sometimes ",
"somewhere ",
"still ",
"such ",
"take ",
"technique ",
"ten ",
"than ",
"that ",
"the ",
"their ",
"them ",
"themselves ",
"then ",
"thence ",
"there ",
"thereafter ",
"thereby ",
"therefore ",
"therein ",
"thereupon ",
"these ",
"they ",
"thickv ",
"term ",
"thin ",
"third ",
"this ",
"those ",
"though ",
"three ",
"through ",
"throughout ",
"thru ",
"thus ",
"to ",
"together ",
"too ",
"top ",
"toward ",
"towards ",
"twelve ",
"twenty ",
"two ",
"un ",
"under ",
"until ",
"up ",
"upon ",
"us ",
"very ",
"via ",
"was ",
"we ",
"well ",
"were ",
"what ",
"whatever ",
"when ",
"whence ",
"whenever ",
"where ",
"whereafter ",
"whereas ",
"whereby ",
"wherein ",
"whereupon ",
"wherever ",
"whether ",
"which ",
"while ",
"whither ",
"who ",
"whoever ",
"whole ",
"whom ",
"whose ",
"why ",
"will ",
"with ",
"within ",
"without ",
"would ",
"yet ",
"you ",
"your ",
"yours ",
"yourself ",
"yourselves ",
"the ",
"likely ",
"names "
);
You may have noticed by the space after I was trying to avoid cutting off strings and want to only replace whole matches (to a NULL value) from my stopword list.
Realizing that str_replace is probably secondary to capabilities and benefits, I turned my eye towards building a preg_replace array in attempt to regex whole words using word boundaries.
$pregreplacestopwords = array(
"/\ba\b/",
"/\babout\b/",
"/\babove\b/",
"/\babove\b/",
"/\bacross\b/",
"/\bafter\b/",
"/\bafterwards\b/",
"/\bagain\b/",
"/\bagainst\b/",
"/\ball\b/",
"/\balmost\b/",
"/\balone\b/",
"/\balong\b/",
"/\balready\b/",
"/\balso\b/",
"/\balthough\b/",
"/\balways\b/",
"/\bam\b/",
"/\bamong\b/",
"/\bamongst\b/",
"/\bamoungst\b/",
"/\bamount\b/",
"/\ban\b/",
"/\band\b/",
"/\banother\b/",
"/\bany\b/",
"/\banyhow\b/",
"/\banyone\b/",
"/\banything\b/",
"/\banyway\b/",
"/\banywhere\b/",
"/\bare\b/",
"/\baround\b/",
"/\bas\b/",
"/\bat\b/",
"/\bback\b/",
"/\bbe\b/",
"/\bbecame\b/",
"/\bbecause\b/",
"/\bbecome\b/",
"/\bbecomes\b/",
"/\bbecoming\b/",
"/\bbeen\b/",
"/\bbefore\b/",
"/\bbeforehand\b/",
"/\bbehind\b/",
"/\bbeing\b/",
"/\bbelow\b/",
"/\bbeside\b/",
"/\bbesides\b/",
"/\bbetween\b/",
"/\bbeyond\b/",
"/\bbill\b/",
"/\bboth\b/",
"/\bbottom\b/",
"/\bbut\b/",
"/\bby\b/",
"/\bcan\b/",
"/\bcannot\b/",
"/\bcant\b/",
"/\bco\b/",
"/\bcon\b/",
"/\bcould\b/",
"/\bcouldnt\b/",
"/\bcry\b/",
"/\bconsidered\b/",
"/\bdescribe\b/",
"/\bdetail\b/",
"/\bdo\b/",
"/\bdid\b/",
"/\bdone\b/",
"/\bdown\b/",
"/\bdue\b/",
"/\bduring\b/",
"/\beach\b/",
"/\beg\b/",
"/\beight\b/",
"/\beither\b/",
"/\beleven\b/",
"/\belse\b/",
"/\belsewhere\b/",
"/\bempty\b/",
"/\benough\b/",
"/\betc\b/",
"/\beven\b/",
"/\bever\b/",
"/\bevery\b/",
"/\beveryone\b/",
"/\beverything\b/",
"/\beverywhere\b/",
"/\bexcept\b/",
"/\bfew\b/",
"/\bfifteen\b/",
"/\bfify\b/",
"/\bfill\b/",
"/\bfind\b/",
"/\bfire\b/",
"/\bfive\b/",
"/\bfor\b/",
"/\bformer\b/",
"/\bformerly\b/",
"/\bforty\b/",
"/\bfound\b/",
"/\bfour\b/",
"/\bfrom\b/",
"/\bfront\b/",
"/\bfull\b/",
"/\bfurther\b/",
"/\bget\b/",
"/\bgive\b/",
"/\bgo\b/",
"/\bhad\b/",
"/\b//has\b/",
"/\bhasnt\b/",
"/\bhave\b/",
"/\bhe\b/",
"/\bhence\b/",
"/\bher\b/",
"/\bhere\b/",
"/\bhereafter\b/",
"/\bhereby\b/",
"/\bherein\b/",
"/\bhereupon\b/",
"/\bhers\b/",
"/\bherself\b/",
"/\bhim\b/",
"/\bhimself\b/",
"/\bhis\b/",
"/\bhow\b/",
"/\bhowever\b/",
"/\bhundred\b/",
"/\bie\b/",
"/\bif\b/",
"/\bIn\b/",
"/\binc\b/",
"/\bindeed\b/",
"/\binterest\b/",
"/\binto\b/",
"/\bis\b/",
"/\bit\b/",
"/\bits\b/",
"/\bitself\b/",
"/\bkeep\b/",
"/\bknown\b/",
"/\b//last\b/",
"/\blatter\b/",
"/\blatterly\b/",
"/\bleast\b/",
"/\blegend\b/",
"/\bless\b/",
"/\bltd\b/",
"/\b//made\b/",
"/\bmany\b/",
"/\bmay\b/",
"/\bme\b/",
"/\bmeanwhile\b/",
"/\bmight\b/",
"/\bmill\b/",
"/\bmine\b/",
"/\bmore\b/",
"/\bmoreover\b/",
"/\bmost\b/",
"/\bmostly\b/",
"/\bmove\b/",
"/\bmuch\b/",
"/\bmust\b/",
"/\bmy\b/",
"/\bmyself\b/",
"/\bname\b/",
"/\bnamely\b/",
"/\bneither\b/",
"/\bnever\b/",
"/\bnevertheless\b/",
"/\bnext\b/",
"/\bnine\b/",
"/\bno\b/",
"/\bnobody\b/",
"/\bnone\b/",
"/\bnoone\b/",
"/\bnor\b/",
"/\bnothing\b/",
"/\bnow\b/",
"/\bnowhere\b/",
"/\bof\b/",
"/\boff\b/",
"/\boften\b/",
"/\bon\b/",
"/\bonce\b/",
"/\bone\b/",
"/\bonly\b/",
"/\bonto\b/",
"/\bor\b/",
"/\bother\b/",
"/\bothers\b/",
"/\botherwise\b/",
"/\bour\b/",
"/\bours\b/",
"/\bourselves\b/",
"/\bout\b/",
"/\b//over\b/",
"/\bown\b/",
"/\bpart\b/",
"/\bper\b/",
"/\bperhaps\b/",
"/\bplease\b/",
"/\bpopular\b/",
"/\bput\b/",
"/\brather\b/",
"/\bre\b/",
"/\bsame\b/",
"/\bsee\b/",
"/\bseem\b/",
"/\bseemed\b/",
"/\bseeming\b/",
"/\bseems\b/",
"/\bserious\b/",
"/\bseveral\b/",
"/\bshe\b/",
"/\bshould\b/",
"/\bshow\b/",
"/\bsince\b/",
"/\bsincere\b/",
"/\bsix\b/",
"/\bsixty\b/",
"/\bso\b/",
"/\bsome\b/",
"/\bsomehow\b/",
"/\bsomeone\b/",
"/\bsomething\b/",
"/\bsometime\b/",
"/\bsometimes\b/",
"/\bsomewhere\b/",
"/\bstill\b/",
"/\bsuch\b/",
"/\btake\b/",
"/\btechnique\b/",
"/\bten\b/",
"/\bthan\b/",
"/\bthat\b/",
"/\bthe\b/",
"/\btheir\b/",
"/\bthem\b/",
"/\bthemselves\b/",
"/\bthen\b/",
"/\bthence\b/",
"/\bthere\b/",
"/\bthereafter\b/",
"/\bthereby\b/",
"/\btherefore\b/",
"/\btherein\b/",
"/\bthereupon\b/",
"/\bthese\b/",
"/\bthey\b/",
"/\bthickv\b/",
"/\bterm\b/",
"/\bthin\b/",
"/\bthird\b/",
"/\bthis\b/",
"/\bthose\b/",
"/\bthough\b/",
"/\bthree\b/",
"/\bthrough\b/",
"/\bthroughout\b/",
"/\bthru\b/",
"/\bthus\b/",
"/\bto\b/",
"/\btogether\b/",
"/\btoo\b/",
"/\btop\b/",
"/\btoward\b/",
"/\btowards\b/",
"/\btwelve\b/",
"/\btwenty\b/",
"/\btwo\b/",
"/\bun\b/",
"/\bunder\b/",
"/\buntil\b/",
"/\bup\b/",
"/\bupon\b/",
"/\bus\b/",
"/\bvery\b/",
"/\bvia\b/",
"/\bwas\b/",
"/\bwe\b/",
"/\bwell\b/",
"/\bwere\b/",
"/\bwhat\b/",
"/\bwhatever\b/",
"/\bwhen\b/",
"/\bwhence\b/",
"/\bwhenever\b/",
"/\bwhere\b/",
"/\bwhereafter\b/",
"/\bwhereas\b/",
"/\bwhereby\b/",
"/\bwherein\b/",
"/\bwhereupon\b/",
"/\bwherever\b/",
"/\bwhether\b/",
"/\bwhich\b/",
"/\bwhile\b/",
"/\bwhither\b/",
"/\bwho\b/",
"/\bwhoever\b/",
"/\bwhole\b/",
"/\bwhom\b/",
"/\bwhose\b/",
"/\bwhy\b/",
"/\bwill\b/",
"/\bwith\b/",
"/\bwithin\b/",
"/\bwithout\b/",
"/\bwould\b/",
"/\byet\b/",
"/\byou\b/",
"/\byour\b/",
"/\byours\b/",
"/\byourself\b/",
"/\byourselves\b/",
"/\bthe\b/",
"/\blikely\b/",
"/\bnames\b/"
);
Created a blank array for it:
$pgreplace = array
Let's take the word “B.A.”
for example and put it into a string variable, make it a sentence for fun.
$string = 'I got my “B.A.” from...';
Some methods I've tried have been things such as imploding the stop words,
Attempting things such as
preg_replace($pregreplacestopwords, $pregreplacestopwords, $string);
Just get's filled with errors
Warning: preg_replace(): Compilation failed: missing terminating ] for character class at offset 1951 in C:\wamp64\www\pg\test.php on line 664
Warning: preg_replace(): Empty regular expression in C:\wamp64\www\pg\test.php on line 666
NULL
Warning: preg_replace(): Unknown modifier '/' in C:\wamp64\www\pg\test.php on line 670
NULL
Imploding the array, via$implodestopwords = implode("|", array_map("trim",array_filter($stopwords)));
a|about|above|above|across|after|afterwards|again|against|all|almost|alone|along|already|also
and so forth.
Trying to put this in action
$pattern = '/\b(' . $implodestopwords . ')\b/i';
$string = preg_replace($pattern, "", $string);
var_dump($string);
outputs:
I got “B..” ...
How can I modify my preg_replace to only match exact words and remove them from a large list of words from an array?
Full script here: https://pastebin.com/vwbNjhs9