douchen4547 2012-09-14 04:49
浏览 28
已采纳

检查句子是否有相同的单词

tb_content(left) and tb_word(right) :

=====================================    ================================
|id|sentence |sentence_id|content_id|    |id|word|sentence_id|content_id|
=====================================    ================================
| 1|sentence1|    0      |    1     |    | 1|  a |     0     |    1     |
| 2|sentence2|    1      |    1     |    | 2|  b |     0     |    1     |
| 3|sentence5|    0      |    2     |    | 3|  c |     1     |    1     |
| 4|sentence6|    1      |    2     |    | 4|  a |     1     |    1     |
| 5|sentence7|    2      |    2     |    | 5|  e |     1     |    1     |
=====================================    | 6|  f |     0     |    2     |
                                         | 7|  g |     1     |    2     |
                                         | 8|  h |     1     |    2     |
                                         | 9|  i |     1     |    2     |
                                         |10|  f |     2     |    2     |
                                         |11|  h |     2     |    2     |
                                         |12|  f |     2     |    2     |
                                         ================================

I need to check if every sentence consist of words that owned by other sentences in every content_id.

for example :

Check for the content_id = 1 they are sentence1 and sentence2. from tb_word, we can see that sentence1 and sentence2 consist of the same word a. if the number of a in two sentences is >=2, then a will be the result. So if I print the result, it must be : 00Array ( [0] => a [1] => b) 01Array ( [3] => a ) 10Array ( [3] => a )11Array ( [0] => c [1] => a [2] => e) where 00 means sentence_id = 0 and sentence_id = 0

first, I make functionTotal to count how many sentence that owned by every content_id :

$total = array();
$sql = mysql_query('select content_id, count(*) as RowAmount 
       from tb_content Group By contente_id') or die(mysql_error());
while ($row = mysql_fetch_array($sql)) {
    $total[] = $row['RowAmount']; 
}
return $total;

From that function I get the value of $total and from that I need to check the similarity of some words (from tb_word) between all the possibilities of 2 sentence

foreach ($total as $content_id => $totals){
for ($x=0; $x <= ($totals-1); $x++) {
    for ($y=0; $y <= ($totals-1); $y++) {
      $shared = getShared($x, $y);
    }
}

the function of getShared is :

function getShared ($x, $y){
    $token = array();
    $shared = array();
    $i = 0;
    if ($x == $y) {
        $query = mysql_query("SELECT word FROM `tb_word`
                             WHERE sentence_id ='$x' ");
        while ($row = mysql_fetch_array($query)) {
            $shared[$i] = $row['word'];
            $i++;
        }

    } else {
        $query = mysql_query("SELECT word, count(word) as jml 
                             FROM `tb_word` WHERE sentence_id ='$x' 
                             OR sentence_id ='$y' 
                             GROUP BY word ");
        while ($row = mysql_fetch_array($query)) {
            $jml = $row['jml'];
            $token[$i] = $row['word'];
            if ($jml >= 2) {
                $shared[$i] = $token[$i];
            }
            $i++;
        }

But the result I get is still wrong. the result still mix between different content_id. the result must be group by content_id also. sorry for my bad english and my bad explanation. cmiiw, please help me.. thank you :)

  • 写回答

2条回答 默认 最新

  • dongou6632 2012-09-14 22:34
    关注

    This one can be actually done by DBMS itself, two steps in one query. First, you make a self join in order to prepare sentence combinations within the same content:

    SELECT a.content_id,
           a.sentence_id AS sentence_id_1,
           b.sentence_id AS sentence_id_2
    FROM   tb_content AS a
           JOIN tb_content AS b
             ON ( a.content_id = b.content_id
                  AND a.sentence_id <= b.sentence_id )
    

    The "<=" will keep same sentence joins, like "1-1" or "2-2", and yet avoid bidirectional repetitions, like "1-2" and "2-1". Next you can join the above result with words and count the number of occurances. Like that:

    SELECT s.content_id,
           s.sentence_id_1,
           s.sentence_id_2,
           c.word,
           Count(*) AS jml
    FROM   (SELECT a.content_id,
                   a.sentence_id AS sentence_id_1,
                   b.sentence_id AS sentence_id_2
            FROM   tb_content AS a
                   JOIN tb_content AS b
                     ON ( a.content_id = b.content_id
                          AND a.sentence_id <= b.sentence_id )) AS s
           JOIN tb_word AS c
             ON ( s.content_id = c.content_id
                  AND ( c.sentence_id = s.sentence_id_1
                         OR c.sentence_id = s.sentence_id_2 ) )
    GROUP  BY s.content_id,
              s.sentence_id_1,
              s.sentence_id_2,
              c.word
    HAVING Count(*) >= 2; 
    

    The result of the above query will give you the container, sentences 1 and 2, the word, and the number of occurances (which is 2 or more). All you need now is collecting the result into the array which as I see you already know to do.

    Let me know, if I missunderstood your goal.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)