使用php提取SEO的关键字对

I'm currently investigating some new ideas for long tail SEO. I have a site where people can create their own blogs, which brings pretty good long tail traffic already. I'm already displaying the article title inside the article's title tags.

However, often the title does not match well for keywords in the content, and I'm interested in maybe adding some keywords into the title that php has actually determined would be best.

I've tried using a script which I made to work out what the most common words are on a page. This works ok but the problem with this is it comes up with pretty useless words.

It's occurred to me that what would be useful is to make a php script that would extract frequently occurring pairs (or sets of 3) words and then put them in an array ordered by how often they occur.

My problem: how to parse text in a more dynamic way to look for recurring pairs or triplets of words. How would I go about this?

function extractCommonWords($string, $keywords){
  $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');

  $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
  $string = trim($string); // trim the string
  $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
  $string = strtolower($string); // make it lowercase

  preg_match_all('/\b.*?\b/i', $string, $matchWords);
  $matchWords = $matchWords[0];

  foreach ( $matchWords as $key=>$item ) {
      if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
          unset($matchWords[$key]);
      }
  }   
  $wordCountArr = array();
  if ( is_array($matchWords) ) {
      foreach ( $matchWords as $key => $val ) {
          $val = strtolower($val);
          if ( isset($wordCountArr[$val]) ) {
              $wordCountArr[$val]++;
          } else {
              $wordCountArr[$val] = 1;
          }
      }
  }
  arsort($wordCountArr);
  $wordCountArr = array_slice($wordCountArr, 0, $keywords);
  return $wordCountArr;
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doudonglu3764 2013-04-29 13:14
关注
For the sake of including some code - here's another primitive adaptation that returns multi-word keywords of a given length and occurrences - rather than strip all common words it only filters those that are at the start and end of a keyword. It still returns some nonsense but that is unavoidable really.

function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array(); $common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www'); $str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str))); $str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY); while(0<$len--) for($i=0;$i<count($str)-$len;$i++){ $word = array_slice($str, $i, $len+1); if(in_array($word[0], $common)||in_array(end($word), $common)) continue; $word = implode(' ', $word); if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0; $keywords[$len][$word]++; } $return = array(); foreach($keywords as &$keyword){ $keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); }); arsort($keyword); $return = array_merge($return, $keyword); } return $return; }

run code ^{*on random BBC News article}

The problem with just ignoring common words, grammar and punctuation though is that they still carry meaning within a sentence. If you remove them you are at best changing the meaning or at worst generating unintelligible phrases. Even the idea of extracting "keywords" itself is flawed because words can have different meanings - when you remove them from a sentence you take them out of context.

It's not my area but there are complex studies into natural languages and there is no easy solution - though the general theory goes like this: A computer cannot decipher the meaning of a single piece of text, it has to rely on cross referencing a semantically tagged corpus of related material (which is a huge overhead).
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用php提取SEO的关键字对 php
2013-04-28 03:02

回答 1 已采纳 For the sake of including some code - here's another primitive adaptation that returns multi-word
PHP代码及其对SEO的影响[关闭] php
2016-06-17 17:28

回答 1 已采纳 Search engines visit sites almost like a user, by which I mean a page is prepared by the server an
PHP Seo友好URL php
2016-01-08 11:54

回答 3 已采纳 RewriteCond %{REQUEST_URI} ^download/(.*)$ [NC] RewriteRule ^(.*)$ index.php?down=$1 [L,QSA] to
php中文分词类关键词提取,php如何使用PHPAnalysis提取关键字中文分词
2021-03-24 00:27

一朵小小玫的博客需求：做SEO的keywords时，需要从标题或者正文里提取关键字1.PHPAnalysis下载地址https://github.com/feixuekeji/PHPAnalysis下载后解压放到extend目录下(以tp5为例，其他目录也行)2.封装...
无脂PHP SEO友好的URL php
2014-08-25 15:00

回答 1 已采纳 well if you really want it that way you could use a lambda function for this: f3->route('GET /
thinkphp怎么执行下面PHP文件 php
2021-11-30 23:14

回答 1 已采纳你这是tp3.2的吧？1、url里的 m=模块&c=控制器&a=方法，没有看到你使用a哦2、看一下tp3.2文档的路由配置，最好用路由配置。
使用htaccess从php创建SEO友好的URL php
2014-06-13 11:01

回答 3 已采纳 Place this rule in /macchine/listino/.htaccess: RewriteEngine On RewriteBase /macchine/listino/
php使用PHPAnalysis提取关键字中文分词
2019-08-19 11:22

flysnownet的博客需求：做SEO的keywords时，需要从标题或者正文里提取关键字 1.PHPAnalysis下载地址 http://www.phpbone.com/phpanalysis/#api 原下载地址打不开，已上传到github https://github.com/feixuekeji/PHPAnalysis ...
Php使用api密钥获取数据 php
2017-09-23 20:47

回答 1 已采纳 You would use the API by calling balance() on it, like so; $api = new Api(); $response = $api-&g
在PHP中获取SEO友好URL的问题 php
2014-07-28 10:00

回答 2 已采纳 Your RewriteRule is looking for "something that ends with /". http://example.com/about-us clearly
PHP URL的SEO？ php
2012-02-03 18:32

回答 4 已采纳 The easy solution edit your .htaccess file: RewriteEngine On RewriteBase / RewriteRule ^member([
php中文分词类关键词提取,php如何使用PHPAnalysis提取关键字中文分词_后端开发...
2021-03-24 00:27

零茏的博客 PHP 函数 file_get_contents 怎么用？_后端开发在PHP中file_get_contents函数的作用是将整个文件读入一个字符串，其语法为...需求：做SEO的keywords时，需要从标题或者正文里提取关键字1.PHPAnalysis下载地址ht...
php正则替换语法如何写 php 正则表达式
2021-08-13 12:17

回答 1 已采纳 preg_replace("/(.*?) \|\| (.*?)/iU", '<a class="item" href="$2" title="$1">$1</a>', $
php提取表格数据分词,php怎样运用PHPAnalysis提取关键字中文分词_后端开发
2021-04-28 06:47

田仲政的博客需求：做SEO的keywords时，需要从标题或许正文里提取关键字1.PHPAnalysis下载地点https://github.com/feixuekeji/PHPAnalysis下载后解压放到extend目次下(以tp5为例，其他目次也行) 2.封装/*** @auther: xxf* Date: ...
php中自动提取文章内容关键字seo优化网站的函数方法
2011-11-10 22:03

iteye_8845的博客利用discuz的基础条件实现文章内容自动提取关键字，进而优化网站内链的函数方法方法一：此方法提取的关键字比较接近 function getkey($contents){ // $rows = strip_tags($contents); $arr = array(' ',' ...
没有解决我的问题, 去提问

悬赏问题

¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器

使用php提取SEO的关键字对

1条回答 默认 最新

悬赏问题

1条回答默认最新