douya1855 2013-04-28 03:02
浏览 62
已采纳

使用php提取SEO的关键字对

I'm currently investigating some new ideas for long tail SEO. I have a site where people can create their own blogs, which brings pretty good long tail traffic already. I'm already displaying the article title inside the article's title tags.

However, often the title does not match well for keywords in the content, and I'm interested in maybe adding some keywords into the title that php has actually determined would be best.

I've tried using a script which I made to work out what the most common words are on a page. This works ok but the problem with this is it comes up with pretty useless words.

It's occurred to me that what would be useful is to make a php script that would extract frequently occurring pairs (or sets of 3) words and then put them in an array ordered by how often they occur.

My problem: how to parse text in a more dynamic way to look for recurring pairs or triplets of words. How would I go about this?

function extractCommonWords($string, $keywords){
  $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');

  $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
  $string = trim($string); // trim the string
  $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
  $string = strtolower($string); // make it lowercase

  preg_match_all('/\b.*?\b/i', $string, $matchWords);
  $matchWords = $matchWords[0];

  foreach ( $matchWords as $key=>$item ) {
      if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
          unset($matchWords[$key]);
      }
  }   
  $wordCountArr = array();
  if ( is_array($matchWords) ) {
      foreach ( $matchWords as $key => $val ) {
          $val = strtolower($val);
          if ( isset($wordCountArr[$val]) ) {
              $wordCountArr[$val]++;
          } else {
              $wordCountArr[$val] = 1;
          }
      }
  }
  arsort($wordCountArr);
  $wordCountArr = array_slice($wordCountArr, 0, $keywords);
  return $wordCountArr;
}
  • 写回答

1条回答 默认 最新

  • doudonglu3764 2013-04-29 13:14
    关注

    For the sake of including some code - here's another primitive adaptation that returns multi-word keywords of a given length and occurrences - rather than strip all common words it only filters those that are at the start and end of a keyword. It still returns some nonsense but that is unavoidable really.

    function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array();
      $common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
      $str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str)));
      $str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
      while(0<$len--) for($i=0;$i<count($str)-$len;$i++){ 
         $word = array_slice($str, $i, $len+1);
        if(in_array($word[0], $common)||in_array(end($word), $common)) continue;
        $word = implode(' ', $word);
        if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0;
        $keywords[$len][$word]++;
      }
      $return = array();
      foreach($keywords as &$keyword){
        $keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); });
        arsort($keyword);
        $return = array_merge($return, $keyword);
      }
      return $return;
    }
    

    run code *on random BBC News article


    The problem with just ignoring common words, grammar and punctuation though is that they still carry meaning within a sentence. If you remove them you are at best changing the meaning or at worst generating unintelligible phrases. Even the idea of extracting "keywords" itself is flawed because words can have different meanings - when you remove them from a sentence you take them out of context.

    It's not my area but there are complex studies into natural languages and there is no easy solution - though the general theory goes like this: A computer cannot decipher the meaning of a single piece of text, it has to rely on cross referencing a semantically tagged corpus of related material (which is a huge overhead).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器