doujiao3346 2019-08-06 08:54
浏览 107

如何使用这种架构在Elastic Search中复制索引?

I have a scenario where I have to import data (millions of records) from multiple sources and save it in a database. A user should get results in under 2-3 seconds when they try to search for any information related to that data.

For this, I designed an architecture where I used golang to import data from multiple sources and pushed data in AWS SQS. I've created a lambda function which triggers when AWS SQS has some data. This lambda function then pushes data in AWS Elastic Search. I've created a Rest API using which I give results to the user.

I use CRON to do this importing work every morning. Now my problem is if a new batch of data comes I want to delete the existing data and replace all of them with the new data. I'm stuck at how I can achieve this deleting and adding new data part.

I thought of creating a temporary index and then replacing it with the original index. But the problem is I do not know when importing has ended and can make this index switch.

  • 写回答

1条回答 默认 最新

  • duanhe7471 2019-08-16 14:06

    The concept you're after is an index alias. The basic workflow would be:

    1. Import today's data into an index with my-index-2019-09-16 (for example).
    2. Make sure the import is complete and worked correctly.
    3. Point the alias to the new index (it's an atomic switch between the indices):

      POST /_aliases
          "actions" : [
              { "remove" : { "index" : "my-index-2019-09-15", "alias" : "my-index" } },
              { "add" : { "index" : "my-index-2019-09-16", "alias" : "my-index" } }
    4. Delete the old index.

    You will double the disk space during the import process, but otherwise this should work without any issues and you only delete data once it has a proper replacement.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?



  • ¥15 安装GroudingDINO RuntimeError: Error compiling objects for extension
  • ¥15 关于推送项目到github的问题
  • ¥15 急!C++指针编写相关的问题
  • ¥15 kerberos身份认证配置问题
  • ¥30 用python写一个多签情况下波场的代理资源和回收资源
  • ¥15 怎么在matlab中输出显示泵的流量-扬程和管路损失与流量均在一个表格里
  • ¥15 matlab学期例题代码答疑
  • ¥15 在线手电筒追加按钮JS
  • ¥15 调用函数时,无关变量的改变引起函数值的改变
  • ¥15 xy坐标转化为经纬度坐标