I want to look for a search term inside a given text and return an array of matches with the search term highlighted with html.
For example:
countMatch($needle, $haystack) { ... }
Given the needle = "foo", and the haystack "foo bar foo foo";
The code should return this array:
array:3 [
0 => "<strong>foo</strong>"
1 => "<strong>foo</strong>"
2 => "<strong>foo</strong>"
]
My code works fine, but I have the huge dilema with accents and other UTF8 characers:
- If the search term contains an utf8 character like: (àáâãäçèéêëìíîïòóôõöùúûü), the function should MATCH ALL fóo, fóo, fõo, etc... WITH foo.
- Same case for the haystack: MATCH foo WITH ANY fóo, fóo, fõo, etc inside haystack.
- Also, the return array should show the highlighted matches and take previous 100 characers and subsecuent 100 characters before and after search term.
So far:
/**
* Count existance of needle and return formated html string of needle
*
* @param string $needle (search term)
* @param string $haystack (text to search)
* @return string|int
*/
private function countMatch($needle, $haystack) {
$matches = array();
$response = array();
$i = 0;
preg_match_all("#(.{0,100}$needle.{0,100})#iu", $haystack, $matches);
if (!empty($matches[0])) {
foreach ($matches[0] as $match) {
$i+=1;
$response[$i] = "..." . str_ireplace($needle, "<span class='marker'>".$needle."</span>", $match) . "...";
}
return $response;
} else {
return 0;
}
}
This works fine it even is case insensitive. However, if I input "foó" I get no matches or if I input "foo" and the haystack contains "fóo, I get no matches.
Expected Results:
Example 1:
- needle = "foo"
- haystack = "this is a fóo right? this also contains thousands of other characters before and after the föo search term."
-
Expected result:
array:2 [ 0 => "fóo" 1 => "föo" ]
Example 2:
- needle = "Foó
- haystack = "this is a foo right? this also contains thousands of other characters before and after the föo search term."
-
Expected result:
array:2 [ 0 => "....foo...." 1 => "....föo...." ]
NOTE:
This regular expression: #(.{0,100}$needle.{0,100})#iu
allows me to paste the leading and following 100 characters of the match.
Yes, as you already guessed, this is a little search engine using MariaDB / MySQL FULLTEXT INDEX and database has no problem with those characters, case sensitivity, etc. However, I can't paint the search results because the mentioned problem.
Original source code of controller: PlantaController
(Inside controller, relevant functions are getPlantaAction, buildResult, countMatch, explodeSearch)
View (to understand how I consume Ajax to paint results): Search form