I have 3 distinct lists of strings. First one contains names of people(from 10 chars to 80 chars long). Second one - room numbers(903, 231 and so on). Last one - group numbers(ABCD-1312, CXVZ-123).
I have a query which is given by a user. Firstly, I tried to search using Levenshtein distance, it didn't work, because whenever user types 3 chars, it gives some room number, even though there is no any digit in query. Then, I tried similar_text(), it worked better, but because people names all have different length, it mostly gives results with shorter names.
Now, the best I come up with is using similar_text() and str_pad() to make each string equal length. Still doesn't work properly.
I want to somehow give extra weight to strings, if they have several matches in a row, or if query and my string starts with the same letter and so on.
$search_min_heap = new SearchMinHeap();
$query = strtolower($query); // similar_text is case sensitive, so make everything lowercase
foreach ($res["result"] as &$item) {
similar_text($query, str_pad(strtolower($item["name_en"]), 100, " "), $cur_distance_en);
similar_text($query, str_pad(strtolower($item["name_ru"]), 100, " "), $cur_distance_ru);
similar_text($query, str_pad(strtolower($item["name_kk"]), 100, " "), $cur_distance_kk);
$cur_max_distance = max($cur_distance_en, $cur_distance_ru, $cur_distance_kk);
$item["matching"] = $cur_max_distance;
$search_min_heap->insert($item);
}
$first_elements = $search_min_heap->getFirstElements($count);