doushang8512 2015-10-08 15:00
浏览 51
已采纳

提交到SQS时将消息自动编码为base64的规则

I am developing an application in which clients (written in multiple languages - Go, C++, Python, C#, Java, Perl and possibly more in the future) submit protobuf (and in some cases, JSON) messages to SQS. At the other end, the messages are read and decoded by Python and Go clients - depending on the message type. Boto seems to automatically encode the messages into base64, but other language libraries don't seem to do so. Or maybe there are some other rules?

Boto does have an option to submit raw messages.

What is the expected behavior here? Am I supposed to encode messages into base64 on my own - which makes boto an odd case - or am I missing something?

This has caused some subtle bugs in my application because an of extra layer of base64 encoding or decoding. As far as I know, there is no idiomatic way to detect whether a message is base64 encoded or not. The best option is to try to decode and see if it throws an exception - something I don't really like.

I tried to look for some documentation, but couldn't find anything with clear guidelines. Maybe I was looking at the wrong places?

Thanks in advance for any pointers.

  • 写回答

1条回答 默认 最新

  • dsfgds4215 2015-10-08 23:10
    关注

    You probably want to encode your messages as something because SQS does not accept every possible byte combination in message payload, at the API. Only valid UTF-8, tab, newline, and carriage return are supported.

    Important

    The following list shows the characters (in Unicode) allowed in your message, according to the W3C XML specification. For more information, go to http://www.w3.org/TR/REC-xml/#charsets If you send any characters not included in the list, your request will be rejected.

    #x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]

    http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html

    The base64 alphabet clearly falls in this range, making it impossible for a message with base64 encoding to be rejected as invalid. Of course, it also bloats your payload, since base64 expands every 3 bytes of the original message into 4 bytes of output (64 symbols limits each output byte to carrying 6 bits of usable information, 3 x 8 → 4 x 6).

    Presumably boto automatically base64-encodes and decodes messages for you in order to be "helpful."

    But there is no reason why base64 has to be used at all.

    An example that comes to mind... valid JSON would also comply with the restricted character ranges supported by SQS payloads. (Theoretically, I guess, JSON could be argued not to be an "encoding," but that would be a bit pedantic).

    There is no clean way to determine whether a message needs to be decoded more than once, other than the sketchy one you proposed, but the argument could be made that if you are in a situation where the need to decode is ambiguous, then that should be eliminated.

    If boto's behavior weren't documented and there were no way to make it behave otherwise, I'd say it is wrong behavior. But, as it is, I'll have to relent a bit and say it's just unusual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示