duanjiwang2927 2017-11-09 17:00
浏览 73
已采纳

处理二进制数据和mb_function重载?

I have a piece of code here which I need either assurance, or "no no no!" about in regards to if I'm thinking about this in the right or entirely wrong way.

This has to deal with cutting a variable of binary data at a specific spot, and also dealing with multi-byte overloaded functions. For example substr is actually mb_substr and strlen is mb_strlen etc.

Our server is set to UTF-8 internal encoding, and so theres this weird little thing I do to circumvent it for this binary data manipulation:

// $binary_data is the incoming variable with binary
// $clip_size is generally 16, 32 or 64 etc
$curenc = mb_internal_encoding();// this should be "UTF-8"
mb_internal_encoding('ISO-8859-1');// change so mb_ overloading doesnt screw this up
if (strlen($binary_data) >= $clip_size) {
    $first_hunk = substr($binary_data,0,$clip_size);
    $rest_of_it = substr($binary_data,$clip_size);
} else {
    // skip since its shorter than expected
}
mb_internal_encoding($curenc);// put this back now

I can't really show input and output results, since its binary data. But tests using the above appear to be working just fine and nothing is breaking...

However, parts of my brain are screaming "what are you doing... this can't be the way to handle this"!

Notes:

  • The binary data coming in, is a concatenation of those two parts to begin with.
  • The first part's size is always known (but changes).
  • The second part's size is entirely unknown.
  • This is pretty darn close to encryption and stuffing the IV on front and ripping it off again (which oddly, I found some old code which does this same thing lol ugh).

So, I guess my question is:

  • Is this actually fine to be doing?
  • Or is there something super obvious I'm overlooking?
  • 写回答

2条回答 默认 最新

  • dongxiaoxiao1983 2017-11-13 16:18
    关注

    MY SOLUTION TO THE WORRY

    I dislike answering my own questions... but I wanted to share what I have decided on nonetheless.

    Although what I had, "worked", I still wanted to change the hack-job-altering of the charset encoding. It was old code I admit, but for some reason, I never looked at hex2bin bin2hex for doing this. So I decided to change it to use those.

    The resulting new code:

    // $clip_size remains the same value for continuity later, 
    // only spot-adjusted here... which is why the *2.
       $hex_data   = bin2hex( $binary_data );
       $first_hunk = hex2bin( substr($hex_data,0,($clip_size*2)) );
       $rest_of_it = hex2bin( substr($hex_data,($clip_size*2)) );
       if ( !empty($rest_of_it) ) { /* process the result for reasons */ }
    

    Using the hex functions, turns the mess into something mb will not screw with either way. A 1 million bench loop, showed the process wasn't anything to be worried about (and its safer to run in parallel to itself than the mb_encoding mangle method).

    So I'm going with this. It sits better in my mind, and resolves my question for now... until I revisit this old code again in a few years and go "what was I thinking ?!".

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效