dougaimian1143 2013-07-10 11:15
浏览 107

我想比较一组文本并获得它们彼此之间的相似/相关性,我使用similar_text(),但我发现它不那么准确

I would like to compare to set of text and get how similar/relevant they are to each other, I use similar_text(), but i found out its not as accurate. Thank you.

For example the following text gives me 66%

Text1: Innovation game, eat, sleep, breathe innovation. It love creativity & passion power Internet drives . We understand time greatest asset, challege meet deadline.

Text2: Soviet union communist policy; Germany league organization disguise enermies beaten wanted.

My code is as below:

echo $student_answer = removeCommonWords($answer)."<br><br>";

$student_answer = strip_tags($student_answer);

echo $memo = removeCommonWords2($memo)."<br><br>";

echo similar_text($memo, $student_answer);
  • 写回答

1条回答 默认 最新

  • drj58429 2013-07-10 11:27
    关注

    You can use the JS version:

    http://phpjs.org/functions/similar_text/

    The JS code shows you the precent code (you can modify the code):

    return (sum * 200) / (firstLength + secondLength);
    

    I hope this will help you!

    EDIT:

    How to use similar_text in JS?

    1. Create a file named similar_text.js and copy&paste this code in it:

       function similar_text (first, second, percent) {
       // http://kevin.vanzonneveld.net
       // +   original by: Rafał Kukawski (http://blog.kukawski.pl)
       // +   bugfixed by: Chris McMacken
       // +   added percent parameter by: Markus Padourek (taken from http://www.kevinhq.com/2012/06/php-similartext-function-in-javascript_16.html)
       // *     example 1: similar_text('Hello World!', 'Hello phpjs!');
       // *     returns 1: 7
       // *     example 2: similar_text('Hello World!', null);
       // *     returns 2: 0
       // *     example 3: similar_text('Hello World!', null, 1);
       // *     returns 3: 58.33
       if (first === null || second === null || typeof first === 'undefined' || typeof second === 'undefined') {
         return 0;
       }
      
       first += '';
       second += '';
      
       var pos1 = 0,
         pos2 = 0,
         max = 0,
         firstLength = first.length,
         secondLength = second.length,
         p, q, l, sum;
      
       max = 0;
      
       for (p = 0; p < firstLength; p++) {
         for (q = 0; q < secondLength; q++) {
           for (l = 0;
           (p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
           if (l > max) {
             max = l;
             pos1 = p;
             pos2 = q;
           }
         }
       }
      
       sum = max;
      
       if (sum) {
         if (pos1 && pos2) {
           sum += this.similar_text(first.substr(0, pos2), second.substr(0, pos2));
         }
      
         if ((pos1 + max < firstLength) && (pos2 + max < secondLength)) {
           sum += this.similar_text(first.substr(pos1 + max, firstLength - pos1 - max), second.substr(pos2 + max, secondLength - pos2 - max));
         }
       }
      
       if (!percent) {
         return sum;
       } else {
         return (sum * 200) / (firstLength + secondLength);
       }
      }
      
    2. In your put the following line:

        <script type="text/JavaScript" src="YOUR_PATH/similar_text.js"></script>
      
    3. Now you can use it in your body:

        <script>
         similar_text('Hello World!', 'Hello phpjs!');
        </script>
      

    It will output 7.

    Hope this wil help you!

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题