douhui3305 2011-08-13 01:38
浏览 12
已采纳

唯一标识服务器上的文件和目录以进行比较

What would be the best way to compare files and or directories. Lets say I want to store files on a sever or collective of servers like a cloud based system. My users are in collaboration with one another in many cases and some not. Either way I can have upwards of a hundred people or more with the same exact file. Only key difference is they likey renamed the file or whatever. But essentially same exact data all around. Now other thing is there is no specific file type. There's pdf, doc, docx, txt, videos, audio files, etc.. but the issue boils down to the same files over and over. What i want to do is cut it down. Remove the hundreds of dupes and with the help of a database store things like the file name the user provided so I can in turn store the single file left how and where I want while still providing the info they used essentially.

Now i know I can do something with md5 or sha1 or sha2 or something equivalent that will essentially give me a unique value I can use for such comparisons. But i am not exactly sure how or where to begin with that. Such as how with php can I get the sha or md5 of a file? When i look up stuff for those I get methods for strings but not files..

Overall I am here looking to bounce ideas around to figuring this out not so much as a direct means.. any help would be great.

  • 写回答

6条回答 默认 最新

  • douturan1807 2011-08-13 01:45
    关注
    $filePath = '/var/www/site/public/uploads/foo.txt'
    $data = file_get_contents($filePath); 
    
    $key = sha1($data);   //or     $key = sha1_file($filePath);
    

    Save this $key in a column of table also mark that column as UNIQUE so no to same file can be stored by default.

    Use sha1 instead of md5 since many version control system like git use sha1 hash itself to identify uniqueness of file

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大