doubing3662 2011-06-29 12:14
浏览 41
已采纳

在HTML中搜索和替换单词

what I'm trying to do is make a 'jargon buster'. Basically I have some html and some glossary terms in a database. When the person clicks on jargon buster it replaces the words in the text with a nice tooltip (wztooltip) which shows them the meanings.

I've been trying hard on this one and been looking heavily at this question Regex / DOMDocument - match and replace text not in a link

and it seems like the answer lies in the simple_html_dom libs but I'm having trouble getting it to work. Obviously any words already linked don't get touched. Here is a strip down of what I've got.

$html = str_get_html($article['content']);

$query_glossary = "SELECT word,glossary_term_id,info FROM glossary_terms WHERE status = 1  ORDER BY LENGTH(word) DESC";
$result_glossary = mysql_query_run($query_glossary);

while($glossary = mysql_fetch_array($result_glossary)) {
    $glossary_link = SITEURL.'/glossary/term/'.string_to_url($glossary['word']).'-'.$glossary['glossary_term_id'];
    if(strlen($glossary['info'])>400) {
        $glossary_info = substr(strip_tags($glossary['info']),0,350).' ...<br /> <a href="'.$glossary_link.'">Read More</a>';
    }
    else {
        $glossary_info = $glossary['info'];
    }
    $glossary_tip = 'href="javascript:;" onmouseout="UnTip();" class="article_jargon_highligher" onmouseover="'.tooltip_javascript('<a href="'.$glossary_link.'">'.$glossary['word'].'</a>',$glossary_info,400,1,0,1).'"';
    $glossary_word = $glossary['word'];
    $glossary_word = preg_quote($glossary_word,'/');

    //once done we can replace the words with a nice tip    
    foreach ($html->find('text') as $element) {
        if (!in_array($element->parent()->tag,array())) {
            //problems are case aren't taken into account and grammer
            $element->innertext = str_ireplace(''.$glossary['word'].' ',' <a '.$glossary_tip.' >'.$glossary['word'].'</a> ', $element->innertext);

           //$element->innertext = str_ireplace(''.$glossary['word'].',',' <a '.$glossary_tip.'>'.$glossary['word'].'</a> ', $element->innertext);
           //$element->innertext = preg_replace ("/\s(".$glossary_word.")\s/ise","nothing(' <a'.'$glossary_tip.'>'.'$1'.'</a> ')" , $element->innertext);
          // $element->innertext = str_replace('__glossary_tip_replace__',$glossary_tip, $element->innertext);
        }
    }
}
$article['content'] = $html->save();
  • 写回答

3条回答 默认 最新

  • dty3416 2011-07-01 18:19
    关注

    Use the inverted word character \W to select for any characters other than numbers and letters in your regex pattern. Because this would still fail at the boundaries of the text blob, you would also need to test those conditions as well. Thus using the word 'term' as the text you are searching for:

    (^term$)|(^term\W)|(\Wterm\W)|(\Wterm$)
    

    The first condition checks to make sure that term isn't the only contents of the blob, the second checks if its the first word, the third if it contained within the blob, and the last if its the last word.

    If you want to consider any other characters as word characters (say a hyphen) you would need to repace the \W with [^\w\-].

    Hope this helps. There are probably optimizations that can performed as well, but this should at least be a good starting point.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 lammps拉伸应力应变曲线分析
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题
  • ¥30 python代码,帮调试
  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python