dpg98445 2016-07-08 06:47
浏览 36
已采纳

preg_match_all找到单词链

I'm programming a keywords density tool, but I have some problems with finding a keyword that exists of multiple words.

I used some code of the this topic on Stack Overflow, but it doesn't fully work.

When I have for example a really large text, scraped from a webpage, it doesn't find more than one keyword. For example:

If I have the text "Hello this is me. Hello this is him. Hello this is her." and my keyword is "Hello this", it doesn't count 'hello this' more than once. But there are 3 instances of 'hello this'.

The code I have:

// Count words in the text
$word_count = explode(' ', $text);
$word_count = count($word_count);

// Count matches with the keyword
$keyword_count = preg_match_all("#{$searchterm}#si", $text, $matches);
$keyword_count = count($matches);

// Calculate density 
$density = $keyword_count / $word_count * 100;

How can I make my code working for this problem?

  • 写回答

2条回答 默认 最新

  • duandao2083 2016-07-08 07:09
    关注

    $keyword_count = count($matches);

    You should calculate count matches, this is in $matches[0]. See var_dump($matches);

    // Count matches with the keyword
    preg_match_all("#({$searchterm})#si", $text, $matches);
    $keyword_count = count($matches[0]); // or use $matches[1], in this case same
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 yolov8边框坐标
  • ¥15 matlab中使用gurobi时报错
  • ¥15 WPF 大屏看板表格背景图片设置
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真