PHP替换我的文件中的常用单词

I've tried to make a tool in which you input a website and when you click the submit button it cURLS all the text.

After all the cURLing, stripping it from tags, and counting the words. It's eventually an array named $frequency. If I echo it using <pre> tags it will show me everything just fine! (NOTE: I'm placing the contents in a file, $homepage = file_get_contents($file); and this is what I work with in my code, I don't know if this matters or not)

However i don't really care if the word or is seen 200 times in a website, I only want the important words. So i have made an array with all the common words. Which is set eventually in the $common_words variable. But i can't seem to find a way to replace all words found in the $frequency to replace them with "" if they are found in the $common_words as well.

I've found this piece of code after some research:

$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");

foreach ($wordlist as &$word) {
    $word = '/\b' . preg_quote($word, '/') . '\b/';
}

$string = preg_replace($wordlist, '', $string);
var_dump($string);

If I copy paste this it works fine, removing the or, and, where from the string. But replacing $string with $frequency or replacing $wordlist with $common_words will either not work or throw me an error like: Delimiter must not be alphanumeric or backslash

I hope i've formulated my question properly, if not. Please tell me!

Thanks in advance

EDIT: Alright, i've narrowed down the problem alot. First of all i forgot the & inside the foreach ($wordlist as &$word) {

But as it was counting all the words, the words it has replaced are all still counted. See those 2 screenshots to see what I mean: http://imgur.com/oqqZR3h,xHEZKRz#0

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dousonghs58612 2014-03-31 08:45
关注
If I understand this correctly you wan't to know how many occurrences each word has by ignoring the so called common words.

Assuming that $url is the page you will be running against and $common_words is your common words array, here is what you can do:

// Get the page content's and strip the html tags $contents = strip_tags( file_get_contents($url) ); // This will split the words from the contents, creating an array with each word in it preg_match_all("/([\w]+[']?[\w]*)\W/", $contents, $words); $common_words = array('or', 'and', 'I', 'where'); $frequency = array(); // Count occurrences $frequency = array_count_values($words[0]); unset($words); // Release all that memory var_dump($frequency);

At this point you will have an associative array with each not common word and a count showing the number of occurrences of the given word.

UPDATE

A bit more about the RegEx. We need to match word. The easiest way possible is: (\w+). But that won't match words like I've or haven't (Notice the '). That was my point of making it more complicated. Also, \w doesn't support dashes for words like in 6-year-old.

So I created a subgroup which should match words characters including dashed and single quotes in a word.

(?:\w'|\w|-)

The ?: part on the beginning is do not match or do not include in the results. That is since all I am doing is grouping the options for word contents. To mach an entire word the RegEx will match one or more of the subgroup above:

((?:\w'\w|\w|-)+)

So the RegEx preg_match_all() line should be:

preg_match_all("/((?:\w'\w|\w|-)+)/", $contents, $words);

Hope this helps.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

PHP替换我的文件中的常用单词 php
2014-03-31 08:08

回答 3 已采纳 If I understand this correctly you wan't to know how many occurrences each word has by ignoring th
替换数组PHP中的一个单词[复制] php
2019-06-19 12:22

回答 2 已采纳 foreach ($ar as &$item) { if ($item['title'] === 'My contracts') { $item['title'] = 'S
PHP如何替换字符串中的确切单词 php
2012-04-03 01:24

回答 2 已采纳 Using a regular expression like /\bword\b/ will replace only the word as a whole. \b denotes a wor
php 单词替换,如何在PHP中替换字符串中的单词？
2021-04-20 06:22

105菌的博客给定一个包含一些单词的字符串，任务是替换PHP中给定字符串str中出现的所有单词。为了完成这项任务，我们在PHP中使用了以下方法：方法1:使用str廑replace()方法：str廑replace()给定一个包含一些单词的字符串，任务...
PHP替换多个单词 php
2015-05-05 13:36

回答 3 已采纳 <?php $wordsToKeep = array('car', 'circle', 'roof'); $text = 'I have a car with red one circle
如何用php替换句子中的某些单词[关闭] php
2013-03-09 21:44

回答 2 已采纳 Using str_replace() functions: $sentence = "The president obama came back to america from africa
如何在php中的字符串中找到并替换第一个单词 php
2018-03-14 18:37

回答 1 已采纳 Create a function for this, passed 3 arguments. Logic is convert string into array by explode() fu
php 单词替换,关于字符串：PHP用文本替换数组中的单词
2021-03-17 16:09

阔喵撩影的博客 1要替换的话在我的字符串匹配我的阵列字(长文本)这是我的数组的外观：array(0 =>"hello",1 =>"author",2 =>"cars",)这是我的琴弦的样子：Lorem存有胡萝卜，利弊etetur eirmod多年来;每月，但你应该工作...
用PHP替换字符串中的多个单词 php
2012-01-30 17:23

回答 2 已采纳 Given some text $subject = <<<TEXT I need a systematic way of replacing each word in a s
PHP：用* [关闭]替换.txt文件中列表中的单词 php
2016-10-07 12:53

回答 2 已采纳 Consider this as a kind of pseudocode, although it's PHP. It should guide you for your desired sol
用PHP中的字符串中的数字/数字替换匹配的单词 php
2015-08-09 02:59

回答 2 已采纳 This may be a little too long, but you get the idea: http://3v4l.org/JfXBN <?php $str="Please
php 查找文件替换内容,在文件中查找替换文本
2021-04-12 20:58

weixin_39759182的博客在这个页面中：* [搜索当前文件](#搜索当前文件)* [在当前文件中替换](#在当前文件中替换)* [使用搜索结果](#使用搜索结果)* [搜索和替换选项](#搜索和替换选项)## 搜索当前文件1. 从主菜单中，选择**Edit | Find | ...
PHP str_replace替换两个单词 php
2017-10-28 09:42

回答 3 已采纳 It seems like it is replacing h with Half and then the last f of Half with Full May be if you can
PHP程序替换字符串中的单词
2020-07-19 17:12

cumubi7552的博客 What if a user has typed his name wrong and now he wants to replace his name with the correct ... This a practical scenario, and PHP even has an inbuilt function to do so. There are the following me...
PHP-Search-and-Replace:在基于文本的文件中搜索和替换字符串
2021-07-05 09:43

PHP 搜索和替换PHP Search and Replace 是一个 PHP 5 类，可以在多个基于文本的文件和子目录中搜索和替换字符串。特征在多个基于文本的文件上搜索和替换文本字符串限制搜索特定文件类型 - 即：php,sql,txt,html,xml ...
没有解决我的问题, 去提问

悬赏问题

¥15 单片机学习顺序问题！！
¥15 ikuai客户端多拨vpn，重启总是有个别重拨不上
¥20 关于#anlogic#sdram#的问题，如何解决？(关键词-performance)
¥15 相敏解调 matlab
¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应
¥15 matlab基于pde算法图像修复，为什么只能对示例图像有效
¥100 连续两帧图像高速减法
¥15 如何绘制动力学系统的相图

PHP替换我的文件中的常用单词

3条回答 默认 最新

悬赏问题

3条回答默认最新