dongnigeng1295 2017-09-15 05:55
浏览 46
已采纳

对如何构建可伸缩系统的理解感到困惑

I need some guidance on how to properly build out a system that will be able to scale. I will give you some information about what I am trying to do and then ask my specific question.

I have a site where I want visitors to send some data to be processed. They input the data into a textarea or upload it in a file. Simple. The data is somewhat preprocessed on the client side before a POST request is made to a REST endpoint.

What I am stuck on is what is a good way to take this posted data store it and then associate an id with it that references the user since I cannot process the data fast enough for it to be returned to the user in a reasonable amount of time?

This question is a bit vague and open to opinion, I admit it. I just need a push in the right direction to keep moving. What I have been considering is throwing the data into a message queue and then having some workers process the data elsewhere and when the data is processed alert the user as to where to find it with some sort of link to an S3 bucket or just a URL to a file. The other idea was to just run the request for each item to be processed against another end-point that already processes individual records in some sort of loop client side. The problem is as follows with this idea:

To process the data it may take somewhere from 30 minutes to 2 hours depending upon the amount that they want processed. It's not ideal for them to just sit there and wait for that to finish depending on the amount of records they need processed, so I have ruled this out mostly.

Any guidance would be very much appreciated as I don't have any coworkers to bounce things off of, nor do I know many people with the domain knowledge that I could freely ask. If this isn't the right place to ask this, could you point me in the right direction as to where it should be asked?

Chris

  • 写回答

1条回答 默认 最新

  • douhui5529 2017-09-15 06:19
    关注

    If I've got you right, your pipeline is:

    1. Accept item from user

    2. Possibly preprocess/validate it (?)

    3. Put into some queue

    4. Process data

    5. Return result.

    You man use one or several queues on stage (3). Entity from user gets added to one of the queues. If it's big enough, it could be stored in S3 or storage alike, and only info about it put into the queue: link, add date, user id (or email of alike). Processors can pull items from queue and give feedback to users.

    If you have no strict requirements on order, things get much simpler: you don't need any sync between them. Treat all the components: upload acceptors, queues, storages and processors as independent pools of processes. Monitor each pool separately. If there's some bottlenecks - add machines to that pool.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 winform的chart曲线生成时有凸起
  • ¥15 msix packaging tool打包问题
  • ¥15 finalshell节点的搭建代码和那个端口代码教程
  • ¥15 用hfss做微带贴片阵列天线的时候分析设置有问题
  • ¥15 Centos / PETSc / PETGEM
  • ¥15 centos7.9 IPv6端口telnet和端口监控问题
  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 海浪数据 南海地区海况数据,波浪数据
  • ¥20 软件测试决策法疑问求解答