doubu7425 2016-10-19 20:22
浏览 150
已采纳

找到在浏览器中出现“Not Found”错误的图像URL,但是它们实际存在

I have thousands of image urls stored in a table, one per row. The thing is that some of them have bad formatted names with spaces, accented characters, etc, ie like this:

https://www.greatsite.com/upload/memdocs/111046-carte d'identit� 001-072716141540.jpg

When opening this url in a browser, the following error is output:

Not Found
The requested URL /upload/memdocs/111046-carte d'identit� 001-072716141540.jpg was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

I need to programatically find all the image urls that throw this "Not Found" error (in order to later correctly format the image url name).

So far I am trying to use getimagesize() and file_get_contents() but no luck. getimagesize() not always work because I think it kind of fixes the image name, because for example for the url above, it actually does returns and array with the image info. And file_get_contents() always returns something regardless of wether the image url throws the "Not Found" error or not.

Any suggestions on how I could accomplish this? I hope I made sense. Thanks

  • 写回答

2条回答 默认 最新

  • dt3999 2016-10-19 20:52
    关注

    You can get all the images from DB and iterate over them with foreach. In the foreach try checking if file exist. Example:

    foreach ($images as $image) {
        $valid = is_file($imageDir.$image->path);
    }
    

    is_file is optimal way to check if file exist. file_get_contents will read the whole file which is slow.

    Or you can just do regex on the image path:

    foreach ($images as $image) {
        $valid = preg_match('/[0-9a-zA-Z\$-_\.\+!\*'\(\),];\/\?\:\@=\&/', $image->path);
    }
    

    I`m not 100% sure if this regex would properly validate all the urls .... but most of them.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 51单片机中C语言怎么做到下面类似的功能的函数(相关搜索:c语言)
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端
  • ¥15 基于PLC的三轴机械手程序
  • ¥15 多址通信方式的抗噪声性能和系统容量对比
  • ¥15 winform的chart曲线生成时有凸起