douya1855 2013-04-28 03:02
浏览 62
已采纳

使用php提取SEO的关键字对

I'm currently investigating some new ideas for long tail SEO. I have a site where people can create their own blogs, which brings pretty good long tail traffic already. I'm already displaying the article title inside the article's title tags.

However, often the title does not match well for keywords in the content, and I'm interested in maybe adding some keywords into the title that php has actually determined would be best.

I've tried using a script which I made to work out what the most common words are on a page. This works ok but the problem with this is it comes up with pretty useless words.

It's occurred to me that what would be useful is to make a php script that would extract frequently occurring pairs (or sets of 3) words and then put them in an array ordered by how often they occur.

My problem: how to parse text in a more dynamic way to look for recurring pairs or triplets of words. How would I go about this?

function extractCommonWords($string, $keywords){
  $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');

  $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
  $string = trim($string); // trim the string
  $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
  $string = strtolower($string); // make it lowercase

  preg_match_all('/\b.*?\b/i', $string, $matchWords);
  $matchWords = $matchWords[0];

  foreach ( $matchWords as $key=>$item ) {
      if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
          unset($matchWords[$key]);
      }
  }   
  $wordCountArr = array();
  if ( is_array($matchWords) ) {
      foreach ( $matchWords as $key => $val ) {
          $val = strtolower($val);
          if ( isset($wordCountArr[$val]) ) {
              $wordCountArr[$val]++;
          } else {
              $wordCountArr[$val] = 1;
          }
      }
  }
  arsort($wordCountArr);
  $wordCountArr = array_slice($wordCountArr, 0, $keywords);
  return $wordCountArr;
}
  • 写回答

1条回答 默认 最新

  • doudonglu3764 2013-04-29 13:14
    关注

    For the sake of including some code - here's another primitive adaptation that returns multi-word keywords of a given length and occurrences - rather than strip all common words it only filters those that are at the start and end of a keyword. It still returns some nonsense but that is unavoidable really.

    function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array();
      $common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
      $str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str)));
      $str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
      while(0<$len--) for($i=0;$i<count($str)-$len;$i++){ 
         $word = array_slice($str, $i, $len+1);
        if(in_array($word[0], $common)||in_array(end($word), $common)) continue;
        $word = implode(' ', $word);
        if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0;
        $keywords[$len][$word]++;
      }
      $return = array();
      foreach($keywords as &$keyword){
        $keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); });
        arsort($keyword);
        $return = array_merge($return, $keyword);
      }
      return $return;
    }
    

    run code *on random BBC News article


    The problem with just ignoring common words, grammar and punctuation though is that they still carry meaning within a sentence. If you remove them you are at best changing the meaning or at worst generating unintelligible phrases. Even the idea of extracting "keywords" itself is flawed because words can have different meanings - when you remove them from a sentence you take them out of context.

    It's not my area but there are complex studies into natural languages and there is no easy solution - though the general theory goes like this: A computer cannot decipher the meaning of a single piece of text, it has to rely on cross referencing a semantically tagged corpus of related material (which is a huge overhead).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?
  • ¥15 win10权限管理,限制普通用户使用删除功能
  • ¥15 minnio内存占用过大,内存没被回收(Windows环境)
  • ¥65 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面