doushang8512 2015-10-08 15:00
浏览 51
已采纳

提交到SQS时将消息自动编码为base64的规则

I am developing an application in which clients (written in multiple languages - Go, C++, Python, C#, Java, Perl and possibly more in the future) submit protobuf (and in some cases, JSON) messages to SQS. At the other end, the messages are read and decoded by Python and Go clients - depending on the message type. Boto seems to automatically encode the messages into base64, but other language libraries don't seem to do so. Or maybe there are some other rules?

Boto does have an option to submit raw messages.

What is the expected behavior here? Am I supposed to encode messages into base64 on my own - which makes boto an odd case - or am I missing something?

This has caused some subtle bugs in my application because an of extra layer of base64 encoding or decoding. As far as I know, there is no idiomatic way to detect whether a message is base64 encoded or not. The best option is to try to decode and see if it throws an exception - something I don't really like.

I tried to look for some documentation, but couldn't find anything with clear guidelines. Maybe I was looking at the wrong places?

Thanks in advance for any pointers.

  • 写回答

1条回答 默认 最新

  • dsfgds4215 2015-10-08 23:10
    关注

    You probably want to encode your messages as something because SQS does not accept every possible byte combination in message payload, at the API. Only valid UTF-8, tab, newline, and carriage return are supported.

    Important

    The following list shows the characters (in Unicode) allowed in your message, according to the W3C XML specification. For more information, go to http://www.w3.org/TR/REC-xml/#charsets If you send any characters not included in the list, your request will be rejected.

    #x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]

    http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html

    The base64 alphabet clearly falls in this range, making it impossible for a message with base64 encoding to be rejected as invalid. Of course, it also bloats your payload, since base64 expands every 3 bytes of the original message into 4 bytes of output (64 symbols limits each output byte to carrying 6 bits of usable information, 3 x 8 → 4 x 6).

    Presumably boto automatically base64-encodes and decodes messages for you in order to be "helpful."

    But there is no reason why base64 has to be used at all.

    An example that comes to mind... valid JSON would also comply with the restricted character ranges supported by SQS payloads. (Theoretically, I guess, JSON could be argued not to be an "encoding," but that would be a bit pedantic).

    There is no clean way to determine whether a message needs to be decoded more than once, other than the sketchy one you proposed, but the argument could be made that if you are in a situation where the need to decode is ambiguous, then that should be eliminated.

    If boto's behavior weren't documented and there were no way to make it behave otherwise, I'd say it is wrong behavior. But, as it is, I'll have to relent a bit and say it's just unusual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 socket通信实现多人聊天室疑惑
  • ¥15 DEV-C++编译缺失
  • ¥33 找熟练码农写段Pyhthon程序
  • ¥100 怎么让数据库字段自动更新
  • ¥15 antv g6 力导向图布局
  • ¥15 quartz框架,No record found for selection of Trigger with key
  • ¥15 锅炉建模+优化算法,遗传算法优化锅炉燃烧模型,ls-svm会搞,后面的智能算法不会
  • ¥20 MATLAB多目标优化问题求解
  • ¥15 windows2003服务器按你VPN教程设置后,本地win10如何连接?
  • ¥15 求一阶微分方程的幂级数