2017-11-09 17:00
浏览 68


I have a piece of code here which I need either assurance, or "no no no!" about in regards to if I'm thinking about this in the right or entirely wrong way.

This has to deal with cutting a variable of binary data at a specific spot, and also dealing with multi-byte overloaded functions. For example substr is actually mb_substr and strlen is mb_strlen etc.

Our server is set to UTF-8 internal encoding, and so theres this weird little thing I do to circumvent it for this binary data manipulation:

// $binary_data is the incoming variable with binary
// $clip_size is generally 16, 32 or 64 etc
$curenc = mb_internal_encoding();// this should be "UTF-8"
mb_internal_encoding('ISO-8859-1');// change so mb_ overloading doesnt screw this up
if (strlen($binary_data) >= $clip_size) {
    $first_hunk = substr($binary_data,0,$clip_size);
    $rest_of_it = substr($binary_data,$clip_size);
} else {
    // skip since its shorter than expected
mb_internal_encoding($curenc);// put this back now

I can't really show input and output results, since its binary data. But tests using the above appear to be working just fine and nothing is breaking...

However, parts of my brain are screaming "what are you doing... this can't be the way to handle this"!


  • The binary data coming in, is a concatenation of those two parts to begin with.
  • The first part's size is always known (but changes).
  • The second part's size is entirely unknown.
  • This is pretty darn close to encryption and stuffing the IV on front and ripping it off again (which oddly, I found some old code which does this same thing lol ugh).

So, I guess my question is:

  • Is this actually fine to be doing?
  • Or is there something super obvious I'm overlooking?

图片转代码服务由CSDN问答提供 功能建议

我这里有一段代码,我需要保证,或“不不不!” 关于我是否以正确或完全错误的方式思考这个问题。

这必须处理在特定位置切割二进制数据的变量,以及处理 多字节重载函数。 例如, substr 实际上是 mb_substr strlen mb_strlen 等。

我们的服务器被设置为 UTF-8 内部编码,所以这是我为避免这种二进制数据操作而做的奇怪小事:

  // $ binary_data是带二进制的传入变量
 // $ clip_size通常是16,32或64等
 $ curenc = mb_internal_encoding(); //这应该是“UTF-8”
mb_internal_encoding('ISO-  8859-1'); //改变所以mb_重载不会把它搞砸了
(strlen($ binary_data)> = $ clip_size){
 $ first_hunk = substr($ binary_data,0,$ clip_size); 
 $  rest_of_it = substr($ binary_data,$ clip_size); 
} else {
mb_internal_encoding($ curenc); //现在把它放回

我无法真正显示输入和输出结果,因为它的二进制数据。 但使用上述方法的测试似乎工作得很好,没有任何东西在破坏...

然而,我脑子里的部分内容在尖叫着“你在做什么......这不可能是 处理这个问题的方法“!


  • 进入的二进制数据是这两个部分的串联 开始于。
  • 第一部分的大小始终是已知的(但更改)。
  • 第二部分的大小完全未知。
  • 这非常接近于加密并将IV填充在前面并再次将其剥离(奇怪的是,我发现了一些旧代码,它也做同样的事情)。


    • 这样做真的很好吗?
    • 还是有一些非常明显的东西 我在俯视?
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dongxiaoxiao1983
    dongxiaoxiao1983 2017-11-13 16:18


    I dislike answering my own questions... but I wanted to share what I have decided on nonetheless.

    Although what I had, "worked", I still wanted to change the hack-job-altering of the charset encoding. It was old code I admit, but for some reason, I never looked at hex2bin bin2hex for doing this. So I decided to change it to use those.

    The resulting new code:

    // $clip_size remains the same value for continuity later, 
    // only spot-adjusted here... which is why the *2.
       $hex_data   = bin2hex( $binary_data );
       $first_hunk = hex2bin( substr($hex_data,0,($clip_size*2)) );
       $rest_of_it = hex2bin( substr($hex_data,($clip_size*2)) );
       if ( !empty($rest_of_it) ) { /* process the result for reasons */ }

    Using the hex functions, turns the mess into something mb will not screw with either way. A 1 million bench loop, showed the process wasn't anything to be worried about (and its safer to run in parallel to itself than the mb_encoding mangle method).

    So I'm going with this. It sits better in my mind, and resolves my question for now... until I revisit this old code again in a few years and go "what was I thinking ?!".

    点赞 评论
  • doukekui0914
    doukekui0914 2017-11-09 18:02

    However, parts of my brain are screaming "what are you doing... this can't be the way to handle this"!

    Your brain is right, you shouldn't be doing that in PHP in the first place. :)

    Is this actually fine to be doing?

    It depends the purpose of your code.

    I can't see any reason of the top of my head to cut a binary like that. So my first instinct would be "no no no!" use unpack() to properly parse the binary into usable variables.

    That being said if you just need to split your binary because reasons, then I guess this is fine. As long as your tests confirm that the code is working for you, I can't see any problem.

    As a side note, I don't use mbstring overloading exactly for this kind of use case - i.e. for whenever you need the default string functions.

    点赞 评论