dtla92562 2013-04-21 16:45
浏览 31
已采纳

如何在两个字符串中找到相似单词的数量?

I have two stings:

$var_x = "Depending structure";
$var_y = “Depending on the structure of your array ";

Can you please tell me how can I found out, how many words in var_x is in var_y? In order to do that, I did the following:

$pieces1 = explode(" ", $var_x);
$pieces2 = explode(" ", $var_y);
$result=array_intersect($pieces1, $pieces2);
//Print result here?

But this didn't show many how many of var_x words are in var_y

  • 写回答

1条回答 默认 最新

  • dongxin8392 2013-04-21 17:26
    关注

    Using explode() to split the given string to words is wrong. World is not perfect and you can't make sure each word will be separated with a space.

    See the following lines:

    • "This is a test sentence" - 5 words from explode()
    • "This is a test sentence. Not a word." - 8 words, you will get "sentence." as a word.
      "This is a test
    
    sentence"
    

    - 4 words from explode, "test sentence" is a single word.

    Above examples are just to show that using explode() is plain wrong. Use str_word_count()

    $var_x = "Depending structure";
    $var_y = "Depending on the structure of your array ";
    $pieces1 = str_word_count($var_x, 1);
    $pieces2 = str_word_count($var_y, 1);
    $result=array_intersect(array_unique($pieces1), array_unique($pieces2));
    print count($result);
    

    This will (int) 2, and you will see that your explode() method returns the same value. But in different and complex cases, above method will give correct word count (Also note the array_unique() use)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP