dth62818 2015-12-13 19:46
浏览 86
已采纳

Elasticsearch。 如何结合快速搜索实现以下原则?

My mapping is:

"current_name" => [
    "type"     => "string",
    "index"    => "analyzed",
    "analyzer" => "russian",
    "fields"   => [
        "raw"           => [
            "type"  => "string",
            "index" => "not_analyzed"
        ],
        "raw_lowercase" => [
            "type"     => "string",
            "analyzer" => "tolowercase"
        ]
    ]
],

I need to search the field using the following examples of principles (all together):

  1. Indexed string - "monkeys". I need to find this document by "monkey".

  2. Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world".

  3. Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

All those principles should be applied while user type in his query - quick search. Language is Russian.

Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

How can I realize described? What are your remarks about combining those principles with quick search?

  • 写回答

1条回答 默认 最新

  • dongshan4518 2015-12-15 05:19
    关注

    Autosuggest considerations

    • Searchers expect autosuggest to be highly responsive. If any one of your lenient suggestion features costs >100ms, consider moving it out of autosuggest and into search results.
    • Autosuggest helps to affirm that a searcher is headed in the right direction. For each new lenient suggestion feature you describe and implement, be conscious of the ratio of bad suggestions introduced alongside the good ones. With the limited screen real-estate available for auto-suggest, it's often better to be precise rather than comprehensive.

    Strategies to accomplish what you're asking

    1) Indexed string - "monkeys". I need to find this document by "monkey".

    This is an example of stemming or reducing common inflections of a term to a root form.

    For example, mapping inputs of "fitted", "fitting", "fits", "fit" all to a common form, "fit".

    Stemming has to occur both for indexed terms and for query terms, so that searches for any of the inflections will yield results containing any other inflections.

    Within the Elasticsearch distribution are included two Russian stemmers, russian and light_russian, listed here (follow links to implementation descriptions).

    Any of the suggester implementations can be parameterized with a custom analyzer. By default, they use the analyzer defined in the mapping for the field being suggested.

    2) Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world"

    One solution is simply a boolean search: hello OR my OR beautiful OR world. The implementation of the Elasticsearch match query defaults to boolean and would do what you describe given the phrase "hello my beautiful world" (assuming "hello" and "world" are tokens generated by the searched field's analyzer)

    Another solution try would be using the phrase suggester to piece-together the matching terms in the query. (with max_errors >= 0.5 so that terms my beautiful could be considered misspellings.)

    3) Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

    You're describing a fuzzy search. This search provides 1-2 characters of leniency in the spelling of a term, and would certainly help chronic misspellers, and poor typists.

    Both the completion suggester (which only needs a word prefix to provide suggestions), and the term suggester (which only suggests based on entire terms being entered) have the ability to specify fuzziness or leniency in the "edit distance" between the query and the field value.

    Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

    Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

    (Overall) The phrase suggester may not be able to suggest "the Earth planet is the most beautiful in our Solar system" given the typed phrase "earth is beautifal". This is because there are a number of unrelated terms seperating "earth" and "beautiful" in the source document. A phrase search, with slop set to allow, say a gap of four terms (as in the example), would satisfy this solution. But you'd have to execute a (slower) search request inside your completion logic.

    (Optional 1) Synonyms are discussed here, and can be included in your analyzer. Though, I would split-test this thoroughly, as searchers may not expect to see synonyms in their suggestions.

    (Optional 1) I doubt the completion suggester will complete multiple terms like "beaut worl" you may have to use edge-ngrams. Practically speaking, however, I doubt anyone will ever type this, even accidentally.


    Multiple suggester types can be requested within a _suggest call. You may end up running with a combination of completion and phrase suggesters to cover all of your bases.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 STM32 INMP441无法读取数据
  • ¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
  • ¥15 用visualstudio2022创建vue项目后无法启动
  • ¥15 x趋于0时tanx-sinx极限可以拆开算吗
  • ¥500 把面具戴到人脸上,请大家贡献智慧
  • ¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面,不要作在线的,要离线状态。
  • ¥15 各位 帮我看看如何写代码,打出来的图形要和如下图呈现的一样,急
  • ¥30 c#打开word开启修订并实时显示批注
  • ¥15 如何解决ldsc的这条报错/index error
  • ¥15 VS2022+WDK驱动开发环境