weixin_39865625
2020-12-29 19:17 阅读 11

使用同义词多次请求,有很大概率请求不到数据

问题描述

  • 索引配置信息
JAVA
{
   "test": {
      "aliases": {},
      "mappings": {
         "test": {
            "properties": {
               "text_1": {
                  "type": "string",
                  "analyzer": "synonym"
               }
            }
         }
      },
      "settings": {
         "index": {
            "creation_date": "1482891562524",
            "analysis": {
               "filter": {
                  "remote_synonym": {
                     "type": "dynamic_synonym",
                     "synonyms_path": "http://IP:PORT/waf_file/files/sw",
                     "interval": "30"
                  }
               },
               "analyzer": {
                  "synonym": {
                     "filter": [
                        "remote_synonym"
                     ],
                     "tokenizer": "ik"
                  }
               }
            },
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "uuid": "NMZ4fUryRXyoZ057lQrhDA",
            "version": {
               "created": "2030299"
            }
         }
      },
      "warmers": {}
   }
}
  • 创建一条数据
JAVA
PUT /test/test/1?pretty=1
{
   "text_1" : "水的密度很大"
}
  • 使用如下语法查询数次
JAVA
GET /test/_search
{
    "query": {
        "query_string": {
           "default_field": "text_1",
           "analyzer": "synonym", 
           "query": "density"
        }
    }
}
  • 在文件中新增同义词:密度, density
  • 查询语法
JAVA
GET /test/_search
{
    "query": {
        "query_string": {
           "default_field": "text_1",
           "analyzer": "synonym", 
           "query": "density"
        }
    }
}
  • 可以查到文档
JAVA
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.16609077,
      "hits": [
         {
            "_index": "test",
            "_type": "test",
            "_id": "15",
            "_score": 0.16609077,
            "_source": {
               "text_1": "水的密度很大"
            }
         }
      ]
   }
}
  • 多次请求有很大概率无法检索到文档
JAVA
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   }
}

做过如下尝试

  • shard = 1, replia = 1,不会出现上述问题
  • shard =5, replia = 1,单机两个 ES 组成集群,问题依旧存在
  • 重启ES,不会出现上述问题

不知道问题出在哪里,需要大家的帮助

该提问来源于开源项目:bells/elasticsearch-analysis-dynamic-synonym

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

11条回答 默认 最新

  • weixin_39622398 weixin_39622398 2020-12-29 19:17

    ,先确定集群里面的每个节点都有安装该插件。

    点赞 评论 复制链接分享
  • weixin_39865625 weixin_39865625 2020-12-29 19:17

    不好意思,问题里边我没有描述清楚。我做一些补充说明,该问题在单台机子(只有一个ES服务实例,shard=5, replia=1)时就会出现。因此我想这个和集群中的其他机子没有安转相同插件无关。不知道你还有什么建议?

    点赞 评论 复制链接分享
  • weixin_39834090 weixin_39834090 2020-12-29 19:17

    单台机子 replia = 1 似乎没有什么意义,replia = 0 时会出现相同的情况吗?

    点赞 评论 复制链接分享
  • weixin_39865625 weixin_39865625 2020-12-29 19:17

    replia = 0 也是会出现该问题的。 这一周我一直在纠结这个问题,做了能够想到的实验,但是每次都失败了。不知道你有没有遇到过呢?难道是我的使用方法不对?

    点赞 评论 复制链接分享
  • weixin_39865625 weixin_39865625 2020-12-29 19:17

    找到问题的原因了: DynamicSynonymTokenFilterFactory.create() 方法存在并发,变量DynamicSynonymTokenFilterFactory.dynamicSynonymFilters 不支持并发添加,导致部分的DynamicSynonymFilter 对象没有保存到 dynamicSynonymFilters 中。 解决方案,修改两处:

    JAVA
    private Map<dynamicsynonymfilter integer> dynamicSynonymFilters = new WeakHashMap()-->private List<dynamicsynonymfilter> dynamicSynonymFilters = Collections.synchronizedList(new ArrayList<dynamicsynonymfilter>());
    
    
    public void run() {
        if (synonymFile.isNeedReloadSynonymMap()) {
            synonymMap = synonymFile.reloadSynonymMap();
            for(DynamicSynonymFilter dynamicSynonymFilter : dynamicSynonymFilters) {
                dynamicSynonymFilter.update(synonymMap);
                logger.info("{} success reload synonym", indexName);
            }
        }
    }
    </dynamicsynonymfilter></dynamicsynonymfilter></dynamicsynonymfilter>

    我尝试过如下修改: private Map<DynamicSynonymFilter, Integer> dynamicSynonymFilters = new WeakHashMap()-->private Map<DynamicSynonymFilter, Integer> dynamicSynonymFilters = new ConcurrentHashMap<>(); 但是在 create 方法调用时,有对象丢失,具体原因没有深究。 麻烦你验证修改哈

    点赞 评论 复制链接分享
  • weixin_39875419 weixin_39875419 2020-12-29 19:17

    遇到同样的问题,同样的语句,查询返回的记录条数,总total数,差异比较大

    点赞 评论 复制链接分享
  • weixin_39746229 weixin_39746229 2020-12-29 19:17

    i am having the same issue. after changing synonym.txt same search that before the change returned N results, after the change (and changing the query accordingly + waiting for synonym refresh) searches result in inconsistent responses. no hits, some expected hits, all expected hits.

    UPDATE: I see this is fixed in the new version. i am using an older version for elasticsearch 5.1.1. took the fix from 's pull request. Thanks!

    点赞 评论 复制链接分享
  • weixin_39669204 weixin_39669204 2020-12-29 19:17

    请问此问题在master中修掉了吗?

    点赞 评论 复制链接分享
  • weixin_39865625 weixin_39865625 2020-12-29 19:17

    应该是处理了,之前也有小伙伴询问过这个问题,根据我的分支或者我在本页得修改方案修改代码试试。

    点赞 评论 复制链接分享
  • weixin_39667452 weixin_39667452 2020-12-29 19:17

    为啥我多次请求同义词,会有分词结果不一样的情况?这是为什么呢?

    点赞 评论 复制链接分享
  • weixin_39667452 weixin_39667452 2020-12-29 19:17

    其中“三次方”为自定义词 偶尔出现这种情况 { "tokens": [ { "token": "三", "start_offset": 0, "end_offset": 1, "type": "en", "position": 0 }, { "token": "次方", "start_offset": 1, "end_offset": 9, "type": "m", "position": 1 } ] } 想要这种情况。 { "tokens": [ { "token": "三次方", "start_offset": 0, "end_offset": 9, "type": "userDefine", "position": 0 } ] }

    点赞 评论 复制链接分享

相关推荐