Autosuggest considerations
- Searchers expect autosuggest to be highly responsive. If any one of your lenient suggestion features costs
>100ms, consider moving it out of autosuggest and into search
results.
- Autosuggest helps to affirm that a searcher is headed in the right direction. For each new lenient suggestion feature you describe and implement, be conscious of the ratio of bad suggestions introduced alongside the good ones. With the limited screen real-estate available for auto-suggest, it's often better to be precise rather than comprehensive.
Strategies to accomplish what you're asking
1) Indexed string - "monkeys". I need to find this document by "monkey".
This is an example of stemming or reducing common inflections of a term to a root form.
For example, mapping inputs of "fitted", "fitting", "fits", "fit" all to a common form, "fit".
Stemming has to occur both for indexed terms and for query terms, so that searches for any of the inflections will yield results containing any other inflections.
Within the Elasticsearch distribution are included two Russian stemmers, russian
and light_russian
, listed here (follow links to implementation descriptions).
Any of the suggester implementations can be parameterized with a custom analyzer. By default, they use the analyzer defined in the mapping for the field being suggested.
2) Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world"
One solution is simply a boolean search: hello OR my OR beautiful OR world
. The implementation of the Elasticsearch match
query defaults to boolean and would do what you describe given the phrase "hello my beautiful world" (assuming "hello" and "world" are tokens generated by the searched field's analyzer)
Another solution try would be using the phrase suggester to piece-together the matching terms in the query. (with max_errors >= 0.5 so that terms my
beautiful
could be considered misspellings.)
3) Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".
You're describing a fuzzy search. This search provides 1-2 characters of leniency in the spelling of a term, and would certainly help chronic misspellers, and poor typists.
Both the completion suggester (which only needs a word prefix to provide suggestions), and the term suggester (which only suggests based on entire terms being entered) have the ability to specify fuzziness or leniency in the "edit distance" between the query and the field value.
Overall: Indexed - "the Earth planet is the most beautiful in our
Solar system". I want to find this document by "earth is beautifal".
Optional: 1) Indexed - "great job". I want to find the document by
synonim word "good". 2) Indexed - "beautiful world" find by "beaut
worl"
(Overall) The phrase suggester may not be able to suggest "the Earth planet is the most beautiful in our Solar system" given the typed phrase "earth is beautifal". This is because there are a number of unrelated terms seperating "earth" and "beautiful" in the source document. A phrase search, with slop set to allow, say a gap of four terms (as in the example), would satisfy this solution. But you'd have to execute a (slower) search request inside your completion logic.
(Optional 1) Synonyms are discussed here, and can be included in your analyzer. Though, I would split-test this thoroughly, as searchers may not expect to see synonyms in their suggestions.
(Optional 1) I doubt the completion suggester will complete multiple terms like "beaut worl" you may have to use edge-ngrams. Practically speaking, however, I doubt anyone will ever type this, even accidentally.
Multiple suggester types can be requested within a _suggest
call. You may end up running with a combination of completion
and phrase
suggesters to cover all of your bases.