duankuai6991 2012-08-28 02:06
浏览 27
已采纳

使用正则表达式在句子中包装单词

I'm converting sentences like:

Phasellus turpis, elit. Tempor et lobortis? Venenatis: sed enim!

to:

_________ ______, ____. ______ __ ________? _________: ___ ____!

using:

utf8_encode(preg_replace("/[^.,:;!?¿¡ ]/", "_", utf8_decode($ss->phrase) ))

But I'm facing a problem: Google is indexing all those empty words as keywords. I'd like to convert the original strings to something invisible to Google, like:

<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>&nbsp;&nbsp;&nbsp;&nbsp</span>, ....   

using:

.parent span { text-decoration:underline; }

that is, wrapping words inside span tags, replacing words' characters with &nbsp ; and leaving untouched the special characters .,:;!?¿¡ and space.

Is this possible to solve using a regex? I actually solved this by using a non very efficient loop that scans every character of the string, but I must scan many sentences per page.

  • 写回答

2条回答 默认 最新

  • duanhegn231318 2012-08-28 03:23
    关注

    Use preg_replace_callback and have the callback create the appropriate replacement. Something along the lines of (untested)

    function replacer($match) {
        return "<span>".str_repeat("&nbsp;",strlen($match[1]))."</span>";
    }
    
    // Note the addition of the () and the + near the end of the regex
    utf8_encode(preg_replace_callback("/([^.,:;!?¿¡ ]+)/", "replacer", utf8_decode($ss->phrase) ))
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 我的数据无法存进链表里
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端