drqj8605 2017-07-24 22:12
浏览 36

Php重音/特殊字符搜索和替换正则表达式

I want to look for a search term inside a given text and return an array of matches with the search term highlighted with html.

For example:

countMatch($needle, $haystack) { ... }

Given the needle = "foo", and the haystack "foo bar foo foo";

The code should return this array:

array:3 [
  0 => "<strong>foo</strong>"
  1 => "<strong>foo</strong>"
  2 => "<strong>foo</strong>"
]

My code works fine, but I have the huge dilema with accents and other UTF8 characers:

  1. If the search term contains an utf8 character like: (àáâãäçèéêëìíîïòóôõöùúûü), the function should MATCH ALL fóo, fóo, fõo, etc... WITH foo.
  2. Same case for the haystack: MATCH foo WITH ANY fóo, fóo, fõo, etc inside haystack.
  3. Also, the return array should show the highlighted matches and take previous 100 characers and subsecuent 100 characters before and after search term.

So far:

/**
 * Count existance of needle and return formated html string of needle
 * 
 * @param string $needle (search term)
 * @param string $haystack (text to search)
 * @return string|int
 */
private function countMatch($needle, $haystack) {
    $matches = array();
    $response = array();
    $i = 0;
    preg_match_all("#(.{0,100}$needle.{0,100})#iu", $haystack, $matches);
    if (!empty($matches[0])) {
        foreach ($matches[0] as $match) {
            $i+=1;
            $response[$i] = "..." . str_ireplace($needle, "<span class='marker'>".$needle."</span>", $match) . "..."; 
        }
        return $response;
    } else {
        return 0;
    }
}

This works fine it even is case insensitive. However, if I input "foó" I get no matches or if I input "foo" and the haystack contains "fóo, I get no matches.

Expected Results:

Example 1:

  • needle = "foo"
  • haystack = "this is a fóo right? this also contains thousands of other characters before and after the föo search term."
  • Expected result:

    array:2 [ 0 => "fóo" 1 => "föo" ]

Example 2:

  • needle = "Foó
  • haystack = "this is a foo right? this also contains thousands of other characters before and after the föo search term."
  • Expected result:

    array:2 [ 0 => "....foo...." 1 => "....föo...." ]

NOTE:

This regular expression: #(.{0,100}$needle.{0,100})#iu allows me to paste the leading and following 100 characters of the match.

Yes, as you already guessed, this is a little search engine using MariaDB / MySQL FULLTEXT INDEX and database has no problem with those characters, case sensitivity, etc. However, I can't paint the search results because the mentioned problem.

Original source code of controller: PlantaController

(Inside controller, relevant functions are getPlantaAction, buildResult, countMatch, explodeSearch)

View (to understand how I consume Ajax to paint results): Search form

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 2020长安杯与连接网探
    • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
    • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
    • ¥16 mybatis的代理对象无法通过@Autowired装填
    • ¥15 可见光定位matlab仿真
    • ¥15 arduino 四自由度机械臂
    • ¥15 wordpress 产品图片 GIF 没法显示
    • ¥15 求三国群英传pl国战时间的修改方法
    • ¥15 matlab代码代写,需写出详细代码,代价私
    • ¥15 ROS系统搭建请教(跨境电商用途)