dongxi4235 2015-09-09 18:44
浏览 43
已采纳

在PHP中搜索二进制文件中的字节序列?

I want to find a specific sequence of Bytes in a binary file using PHP. I represented this sequence in hexadecimal, to avoid typing too many 0s and 1s. The sequence to find is 0x4749524f. This is the working solution i came up for now:

$mysequence = "4749524f";
$f = fopen($filename, "r") or die("Unable to open file!");
while(!feof($f)){
    $seq = fread($f, 4);
    if(bin2hex($seq) == $mysequence){
        echo "found!";
        break;
    }
    else if(!feof($f)) fseek($f, -3, SEEK_CUR);
}

What the algorithm does is simple:

  1. Read 4 Bytes
  2. Check if they are equals to the sequence
  3. If they are equals -> found! Stop the execution.
  4. If they are not equals and i am not at the end of the file, go back 3 Bytes into the file and repeat step 1.

Why do I go back 3 Bytes? Because if this is the content of the file:

0000 4749 524f 0000 01b0 0013

If i don't go back 3 Bytes, I will read 0000 4749 on the first iteration, 524f 0000 on the second, 01b0 0013 on the third and as you can see i missed the sequence.

Problem: It's slow like hell...The application will have to work with files up to 50MB big, so it will take forever to find this sequence.

Is there an optimized function in PHP that would do the job? Is there a faster (not dumb like mine) way to do this?

  • 写回答

2条回答 默认 最新

  • douqiangchuai7674 2015-09-09 19:31
    关注

    Doing reads from disk always takes a long time. You can't count on disk caching. That's an OS thing. Instead, do your own "caching", as it were. Read in a long set of bytes, something like maybe 1M (or more). This reduces disk reads. Then search that in memory. When reading the next 1Mbytes, be sure to prepend to it the last 3 bytes of the previous set. Search each set until found. The actual size of your read will need to be a balance between RAM usage and disk reads.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化