duangan6731 2014-04-17 01:29
浏览 89
已采纳

我应该如何按用户和优先级按比例读取多个队列?

I am currently trying to think of ways to replace a MySQL + Cron queuing system with a message queue system (AWS SQS/Beanstalkd/Iron MQ/Redis).

Let's say I have 100 users. These users are able to make API requests to me. Each API request is an SMS which I must send via a single modem which I operate.

Each SMS can have a priority of 1-3.

The problem that I am facing, is that the single modem is a bottleneck, so I can't simply process the queue in a FIFO order, because if one user sends 10,000 SMS and I add these to the queue, my other users would not see any SMS go out until these 10,000 for the first user have finished.

Right now, I am using MySQL for the task:

SELECT COUNT(*) AS `count`, `user_id` FROM `queue` GROUP BY `user_id`

This would give me a result like this:

count  | user_id
-------|--------
10000  | 1
1      | 2

I then add the counts together which gives me 10,001 SMS to process.

I do a sum on each row:

(row_count / total_count) * 100 = percentage

E.g.:

(10000 / 10001) * 100 = 99.9900009999%
(1 / 10001) * 100 = 0.0099990001%

I know that my modem can handle 140 SMS per second, so if my Cron runs on a 1 minute cycle, I will send 8,400 SMS in a minute.

I use these calculations to give me my selections:

ceil( (8400 / 100) * 99.9900009999) ) = 8,400 for user #1
ceil( (8400 / 100) * 0.0099990001) ) = 1 for user #2

So in this case, I do a simple MySQL select for each user with a LIMIT, ordering by priority ASC, to give me any priority 1s first, and any priority 3s last.

It doesn't matter if we push more than 8,400 to the modem because it will simply queue on the modem, although the modem doesn't guarantee FIFO, so we need to be as tight on the 8,400 per minute as possible. In this case we push 8,401 to the modem.

This is much better, because rather than sending all 10,000 for user 1 first, we only do 8,400 and also get some of user 2's SMS out even though they only have 1 SMS. It's still weighted on who has the most SMS to process and it keeps inline with the modem throughput too.

Given the fact that I need priorities, I am currently looking at Beanstalkd as my only option.

I figured I could create a queue for each user, and when API requests come in, add the SMS to the user queue along with the priority.

I would then have one worker, which does a count on each queue (some user queues may be empty so I wouldn't want a worker for each user constantly running).

Once the single worker has the queue count for each user, it will start to read each queue up to the maximum number I specify for each user and push to the modem in order.

So in this case, it will read 8,400 SMS for user #1 and 1 SMS for user #2 in that order.

To get SMS to the modem, I have to use HTTP. If I get a 200 OK, I can delete the job. If I get a Error 500, I will not delete the job so it will be picked up again. For anything else, I will throw an exception and bury the job in Beanstalkd for inspection by a human.

My concerns here is that because I am using HTTP, this is a bottleneck in itself. Ideally I will want to perform 8,400 HTTP requests in 1 minute using cURL (140/sec). I am aware that I can use curl_multi_* functions to perform say 10 HTTP requests concurrently to speed this up but I am looking to see if there could be any other options to speed things up further?

The main issue is that this is blocking. So one user's SMS will go before all of the other users SMS. Here we will process 8,400 SMS for user #1, followed by 1 SMS for user #2.

For example, should I think about spawning a worker for each user once I have their total count of messages to process? If I did this, we would process SMS for user #1 and user #2 concurrently. With this option though, I do worry that I cannot control the overall amount of HTTP requests going to the modem, because I do not want to overload it. What happens if I have 100 child workers all doing 10 HTTP requests concurrently to the modem?

These workers would have to be child processes that close once finished. The parent process would need to know about this to then perform another calculation and spawn new child workers.

If anyone has any suggestions on how to handle this scenario of multiple queues with one queue the other end (the modem), that would be most helpful.

  • 写回答

2条回答 默认 最新

  • doubi4435 2014-04-18 09:52
    关注

    My first thought is to use Beanstalkd priorities, and split the messages into groups, each with a different priority.

    • User 1 wants to send 10,000 msgs.
    • User 2 wants to send 101 msgs.

    • messages 1-100 of user 1 are put into the queue at priority 1

    • messages 101-200 of user 1 are put into the queue at priority 2
    • messages 201-300 of user 1 are put into the queue at priority 3
    • messages 301-400 of user 1 are put into the queue at priority 4 (etc)
    • messages 1-100 of user 2 are put into the queue at priority 1
    • message 101 of user 2 are put into the queue at priority 2 (etc)

    The first 100 messages of each are sent first (which ones really leaves the gate depends on when they were put into the queue). Without a delay (eg, send after 90 seconds) involved, messages/jobs closest to priority 0 get sent first.

    To make sure that some of every user are sent on every round, I'd limit the max priority that you set to the number of customers that you have, so you don't have your biggest customer end up with a priority of 1,000,000 or more, which would that all the rest of their messages had to wait until everyone else had completed. Just restart the priority back at one.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 AT89C51控制8位八段数码管显示时钟。
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题