dth62818 2015-12-13 19:46
浏览 86
已采纳

Elasticsearch。 如何结合快速搜索实现以下原则?

My mapping is:

"current_name" => [
    "type"     => "string",
    "index"    => "analyzed",
    "analyzer" => "russian",
    "fields"   => [
        "raw"           => [
            "type"  => "string",
            "index" => "not_analyzed"
        ],
        "raw_lowercase" => [
            "type"     => "string",
            "analyzer" => "tolowercase"
        ]
    ]
],

I need to search the field using the following examples of principles (all together):

  1. Indexed string - "monkeys". I need to find this document by "monkey".

  2. Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world".

  3. Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

All those principles should be applied while user type in his query - quick search. Language is Russian.

Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

How can I realize described? What are your remarks about combining those principles with quick search?

  • 写回答

1条回答 默认 最新

  • dongshan4518 2015-12-15 05:19
    关注

    Autosuggest considerations

    • Searchers expect autosuggest to be highly responsive. If any one of your lenient suggestion features costs >100ms, consider moving it out of autosuggest and into search results.
    • Autosuggest helps to affirm that a searcher is headed in the right direction. For each new lenient suggestion feature you describe and implement, be conscious of the ratio of bad suggestions introduced alongside the good ones. With the limited screen real-estate available for auto-suggest, it's often better to be precise rather than comprehensive.

    Strategies to accomplish what you're asking

    1) Indexed string - "monkeys". I need to find this document by "monkey".

    This is an example of stemming or reducing common inflections of a term to a root form.

    For example, mapping inputs of "fitted", "fitting", "fits", "fit" all to a common form, "fit".

    Stemming has to occur both for indexed terms and for query terms, so that searches for any of the inflections will yield results containing any other inflections.

    Within the Elasticsearch distribution are included two Russian stemmers, russian and light_russian, listed here (follow links to implementation descriptions).

    Any of the suggester implementations can be parameterized with a custom analyzer. By default, they use the analyzer defined in the mapping for the field being suggested.

    2) Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world"

    One solution is simply a boolean search: hello OR my OR beautiful OR world. The implementation of the Elasticsearch match query defaults to boolean and would do what you describe given the phrase "hello my beautiful world" (assuming "hello" and "world" are tokens generated by the searched field's analyzer)

    Another solution try would be using the phrase suggester to piece-together the matching terms in the query. (with max_errors >= 0.5 so that terms my beautiful could be considered misspellings.)

    3) Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

    You're describing a fuzzy search. This search provides 1-2 characters of leniency in the spelling of a term, and would certainly help chronic misspellers, and poor typists.

    Both the completion suggester (which only needs a word prefix to provide suggestions), and the term suggester (which only suggests based on entire terms being entered) have the ability to specify fuzziness or leniency in the "edit distance" between the query and the field value.

    Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

    Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

    (Overall) The phrase suggester may not be able to suggest "the Earth planet is the most beautiful in our Solar system" given the typed phrase "earth is beautifal". This is because there are a number of unrelated terms seperating "earth" and "beautiful" in the source document. A phrase search, with slop set to allow, say a gap of four terms (as in the example), would satisfy this solution. But you'd have to execute a (slower) search request inside your completion logic.

    (Optional 1) Synonyms are discussed here, and can be included in your analyzer. Though, I would split-test this thoroughly, as searchers may not expect to see synonyms in their suggestions.

    (Optional 1) I doubt the completion suggester will complete multiple terms like "beaut worl" you may have to use edge-ngrams. Practically speaking, however, I doubt anyone will ever type this, even accidentally.


    Multiple suggester types can be requested within a _suggest call. You may end up running with a combination of completion and phrase suggesters to cover all of your bases.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 2024-五一综合模拟赛
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭